The UCOVI Blog

The UCOVI Blog


Making up the numbers - when data analysts go rogue




Ned Stratton: 2nd December 2022

This post is inspired by a story I heard around a year ago about a data analyst who left (taking down his director as well) after having been found out to have been completely making up the stats he was giving out in one of his monthly reports for two years.

Looking to understand the psychological state of mind in which a data professional would report numbers they knew were inaccurate (worse still totally fabricated), I polled Reddit (too sensitive a subject for LinkedIn) for similar stories and ended up with the below stories. Not all of them show number fudging per se, but often include other examples of de-motivated practice or system gaming such as passing off ten-minute tasks as the work of four hours. Five recurring themes of data-analyst job dissatisfaction appear.

  1. Unrealistic deadlines that betray the manager and stakeholder apathy towards the complexity and effort involved in accessing and wrangling data.
  2. Regular reports that require repetitive work to produce that are manifestly not used by the business, but which the business expects to be produced nonetheless.
  3. Misrepresentation of reports and analytics pieces by their consumers (they can be dishonest so why can't I?).
  4. Wild goose chases - being asked to wrangle swathes of data to satisfy idle or farfetched curiosities, pontifications and hypotheses of business users.
  5. A penny dropping about how easy it is for a data analyst to pull the wool over their non-technical manager's eyes.

Here they are:

"Literally kind of did it this afternoon. Tight (unfeasible) deadline so rather than spend all night doing it from the data set just took a known similar result put it in excel as and dressed it up to look like a trend line and then used + and - rand() to great extra lines few lines (dressed up as data points) to build up the 'data'."

----------------------

"I worked with a guy who was a product manager. He used our data warehouse to generate reports for his products. And then … changed the data to make himself look better. Then gave them to his boss. Boss ran the same reports himself. Then called me and asked why the data was different in our hero's reports. "No idea, the data warehouse says X, not Y." Our hero was fired."

----------------------

"I was tasked with recreating an Excel spreadsheet in SSRS to help free up time for the accounting team. Spent a couple days digging into excel formulas and writing queries to match. Finally delivered a report for testing and they said one of the numbers was off (when compared to their excel sheet). Which was odd, because all of the other numbers that used the calculation were correct. I spent hours digging through excel because these formulas were nested and referred to multiple tabs on multiple spreadsheets. I finally get about 5 nested formulas deep, and find the culprit. Sometime in the previous 2 years, someone didn't like the number that was calculated by the formula for that week, and decided to change it to a hard coded 8."

----------------------

"I had a director ask me where my TPS report was. I looked him straight in the face and told him, "I haven't done it in 6 months". He started to chastise me in his very German accent when I stopped him and pointed it out to him, "You didn't notice for 6 months. It must not have been important." So my approach with all business data is, if it's not discussed it's not important."

----------------------

"Not so much intentional bad data but my predecessor would create a pivot table off of some data and then put the data into a daily report by manually entering a formula in the cell. =SUM(4350+1250+250). She did this report every single day during open enrollment for Medicare. I looked at a handful of her reports and many of them were wrong because of typos. These reports were going out to the highest level of executives to report our progress. When I came on my boss told me that I would get this function during open enrollment and that I needed to get familiar with the process because it was taking her 4-5 hours per day every day and I needed to do it. Her entire process was:

  1. Open a Power BI and manually filter for a date then move about 15 columns of numbers manually into Excel.
  2. Create a Pivot table on that data (not refresh a pivot table, create a brand new one.)
  3. Manually move numbers from the pivot table and sum them like I mentioned above.
  4. Update a few other columns with "as of X date" type comments and such.
  5. Do this for 5 different Power BI datasets

Took me two hours to connect the Excel to the Power BI dataset, pull the raw data down, and set it up with pivots that refresh and then use a date slicer. It will take me 5 minutes to create the daily reports. I am still debating on whether to tell my boss about this or not."

----------------------

"Trying to explain to executives that I can't find correlation between "data" about humans that aren't our customers yet, who haven't interacted with our advertising or social media yet, nor interacted with our business yet, and the numbers of new customers we get over time. In other words, I don't know anything about people who I don't know exist and certainly can't measure anything about them without having access to them to be able to tell the future with regards to whether or not they'll become a customer ever."

----------------------

"Quite often in BI and DE the data just isn't available when you start the job, for whatever reason. As an agile BI practice, I would be pushing forward to produce early results to drive things forward, despite the drag factors. In some cases this would involve generation of realistic test data in lieu of the real data arriving, lest anyone grow old and die waiting for some guy somewhere to do their "extract". I have had instances where early prototype delivery of reports and dashboards were, let's say, "misrepresented" by middle managers to their superiors as being the real thing. Queue meetings about, "no, we're still waiting on xyz to provide the data to you", "yes my test data IS very realistic", "no you can't just use it for production however realistic"... etc."

----------------------


Jeremy Wyatt Interview (24/01/2023) ⏪ ⏩ Data in Politics Part 2 - Votesource (12/09/2022)

⌚ Back to Latest Post