Larry Dignan writes with some amount of scepticism in ZDNet about Machine Learning’s emerging reputation as the miracle cure for “bad data hygiene”. I am old enough to remember the buzz around MIS and BI but that you needed a “data warehouse” for any of the miracles management wanted to work. Many vendors touted many tools at the time, and we bought them and waited for the day when their use would make our when our businesses intelligent and our management informed. I believe many of us are still waiting: then, as now, the warehouses had to deal with dumpsters full of data garbage for inbound traffic. I wrote about this in this post here.
Mr Dignan is similarly concerned about the lack of data quality and our seeming inability to solve it. Only he talks of the more recent past as well:
“The last magic box was the data lake where you’d throw in all of your information–structured and unstructured–and then use a Hadoop cluster and a few other technologies to make sense of it all. Before big data, the data warehouse was going to give you insights and solve all your problems along with business intelligence and enterprise resource planning. But without data hygiene in the first place enterprises replicated a familiar, but failed strategy: Poop in. Poop out. And you wouldn’t want to make your in-demand data scientists deal with poo.”
We also share some cynicism on the role of vendors and their disposition towards hyperbole when it comes to peddling analytics charms and amulets:
“Luckily, technology vendors have a magic elixir to sell you…again. The latest concept is to create an abstraction layer that can manage your data, bring analytics to the masses and use machine learning to make predictions and create business value. And the grand setup for this analytics nirvana is to use machine learning to do all the work that enterprises have neglected.”
My earlier post on this topic, however, was interested in the new news (to me) that machine learning was going to save the day. Mr Dignan cruelly dashes my hopes in his article. Whilst he doesn’t completely write off machine learning, he is not exactly praise-singing. Yet. What he does counsel is a good caveat if you happen to believe your data can be cleaned enough to interrogate it for meaning, and you want to spend money to do so:
“Know this: Every technology vendor you have will have some spin on this data abstraction layer to pitch AI and analytics. Also know this: You’ll listen since your data hygiene has been terrible and you need a bail out.”
Wise words indeed. Make sure if you buy any new ideas in this space, you do so firm in the knowledge your data is probably never going to be the stuff of a data scientist’s dreams.
