There is a lot of excitement about Analytics and Big Data. Much of this emanates from what I often think is vendor hyperbole and marketing mystification. Here are some actual blurbs from vendor websites. I haven’t provided names of who so as to protect the innocent;
* “We believe a data-native mindset is key to driving disruption. Through our end-to-end approach, we embed analytics into the fabric of our clients’ business to create new intelligence and unlock trapped value at unprecedented speed and scale.”
* “The data in your organization is full of potential. Only (vendor) has the industry expertise, advanced analytics capability, and the skills to embed analytical decision-making into key organizational processes to maximize its value – turning everyday information into useful and actionable insights.”
* “Real impact starts with insights that help businesses make better decisions. We source the most relevant data, apply world-class analytics and modeling, and create meaningful insights that help businesses make better decisions.”
The trouble is, though, getting good quality analytics requires good quality data. And, for most organisations, this is simply not the case: Multiple ERP systems (still), data floating in the cloud, the proliferation of unstructured data on shared drives and hard drives and corporate servers and so on all help to create a veritable dump or garbage. And this is before IoT has even really begun!
When I learned to code, the concept of GIGO (garbage in, garbage out) was drilled into us early on. Perhaps I am a little simplistic but I am sure vendors selling these solutions can’t miraculously fix the underlying data. Can they? At least that’s what I understood until today.
I stumbled upon a concept today that I’d not come across in a while: data wrangling. It seems that the term is now being applied to mean more than just “those who work with data” (what I had understood it to mean). According to Professor Georg Gottlib, winner of the Lovelace medal, in an interview in Computing, “(d)ata wrangling is how to get data together from different sources and to uniformise them, reason about them, and prepare them for further processing such as analytics and machine learning.” There is now evidence that, using machine learning, there has been a significant improvement in the tools and approaches one can take to whip ones willful data into shape. Arising from academic research, Trifecta is a vendor that has created a product to do exactly that and that relies less on hyperbole and hype, and more on the solid fundament of scientific progress.
And that is superb news for the promise of analytics.
