We all know about reinforcement learning. It’s typically how we humans learn: Do something good as a toddler and you get ooh-ed and aah-ed at by various relatives, motivating you to do it again. Doing something not so good, like putting your hand on a hot sandwich toaster (yes, I did), can also motivate you to NOT do something again.
But today, Reinforcement Learning is something that we have, at last, learned how to teach a computer to do. As someone how has linear sequentialism in my veins from a deep and dark coding past, I know only too well the pitfalls of applying human intellect to telling a machine what to do: anyone remember the Windows auto-restart days? It’s led to a world where we just accept that software will be buggy, and we all, pretty much, just live with it.
Reinforcement Learning is a new programming (r)evolution that may eventually rescue us (after a lot of trial and error) from this eternal state of bugginess. It’s the way AlphaGo famously beat the best Go player in the world in 2016: trial and error, and learning what good outcomes are and which to avoid.
I am excited about this emerging technology as I think the availability of heaps of data will allow the trial and error route to find the flaws in the way in which we humans have designed things. I am particularly interested in its application on corporate process redesign and optimisation, although I think the industry is still in its early days and the technology is still mostly being used in “safe” spaces, like autonomous cars. By safe, I mean the ability to fail is not catastrophic to either the outcome or to your reputation (which could affect future sales).
Very few businesses are going to allow industrialised “trial and error” as a means of discovery just yet. But it will come. This will be another area where the indefatigability of computers will mean they advance more speedily than we can. I will be watching this tech with interest.
