Use 7 simple questions to find machine learning opportunities, even without any technical knowledge
Machine learning, AI, Data Science all carry lots of scary and complicated concepts like deep neural networks, cross-entropy, optimization….
Enough scary words to scare off any product manager but the really tech-savvy from even thinking about integrating machine learning into their products at all.
But that, in turn, makes it hard for a company to get all the value out of their machine learning engineers if most product managers shy away from employing them.
I like to use a dead simple checklist which, in my opinion, any non-techy product manager can use, to spot whether a business opportunity lends itself to machine learning or not.
I’ll explain each point with examples below. But first, here is the complete checklist.
The Machine Learning Opportunity Checklist
If I see something that involves data, and kind of looks like machine learning or statistics/optimization, I go on and ask myself four questions.
What makes a machine learning opportunity?
- Do small improvements count? Does your opportunity pay off, when there is a <10% increase in something?
- Does it involve “always the same stuff”? Does someone have to do the same thing again and again? Does a machine work through long if-then statements? Are there manual workarounds for the “if-then” failures of the machine?
- Does the opportunity need expert knowledge? Does the opportunity need personalization?
- Does the opportunity involve scaling problems due to some manual bottleneck? Does the data behind it grow exponentially?
One Yes is enough to identify an opportunity!
But here’s the catch: Some of these opportunities have some sucker attribute that makes them absolutely not suitable for data science. I like to think that the first questions get me my “apples”. While the next four questions help me discard the “bad apples”. So I go on with these three questions.
What breaks a machine learning opportunity?
- Do you need 99% or 100% accuracy? Does it need to be very accurate?
- Is the problem solvable with simple if-then?
- Is there only a small amount of data? Or any other data problems like privacy concerns?
One No is already enough to identify a bad apple!
Machine Learning Opportunity Makers in Detail
Let’s walk through them step by step, with examples.
(1) Do small improvements count?
Small improvements usually have business value, when the opportunity is close to some kind of “conversion”. Close to some key “North Star” metric event or the likes. Small improvements matter, if they still carry lots of business value.
For instance, the YouTube North Star metric was for quite some time “minutes of videos watched”. As such, any improvement 1–10% improvement in the recommendation system which recommends new videos, yields a direct increase of the North Star metric; so all improvements close to this are machine learning opportunities.
Another example almost any company can come up with is the optimization of the margin, whatever you add on top of what you sell beyond the actual price of production. Any small increased percentage yields quite a bit of money.
(2) Does it involve “always the same stuff”?
Things that involve “always the same stuff” usually yield opportunities that are solved by simple machine learning models.
Simple machine learning models means models that are not perfect, but only 80% accurate. The idea behind this question is that this is usually 10 times easier than a model that yields 90% accuracy. And yes I know, accuracy is a sucker metric, but I hope it makes the point.
Besides, things that involve “always the same stuff” also produce data that is desperately needed to train any model at all.
An almost classic example is sorting customer emails. Customer emails are usually sorted into a bunch of “stacks” thematically grouped so that a customer service person can work through them accordingly. This is either done completely manually or via some simple if… then …. Decisions implemented in some “email router”.
Uber for instance tackled that problem with simple models and their in-house machine learning tool ludwig which is dead simple to use. The success apparently spurred them to create a complete machine learning powerhouse tool around it called COTA which already went into version 2.
(3) Expert knowledge and or personalization?
Things that involve “expert knowledge” usually mean manually skilled labor. COTA, the customer service support tool of uber was extended to do just that. After just “sorting mails” they spotted the second machine learning opportunity: The expert knowledge the customer support needs to then answer the mails.
So they used NLP and some other machine learning magic to suggest pre replies & rank a bunch of possible answers to an incoming question which the customer service can then use and modify.
Another example can be found in comparison websites. They usually need to classify images uploaded, and check whether they look “nice”. To do so, a human has to look at a large number of images and then develop an understanding of what is what (yes expert as in there are dogs that a human is not really able to distinguish from a cat!) and what is pretty and what is not.
The German comparison company Idealo moved that task partially over to machine learning in 2019 and is now able to use these results in their engine.
Personalization fits into this context really well, because personalization really is just having an expert understanding of someone’s customers, and as such for instance displaying the correct advertising material & offerings in an e-mail to the appropriate set of customers. Again a task that is almost in any marketing automation standard machine learning kit.
(4) Manual bottleneck?
Manual bottlenecks are all around. The software can scale really easy these days, but people usually do not. So a prime task in fast-growing companies is to identify manual bottlenecks and replace them with technology, even if the technology sometimes does a worse job at the task than humans.
In exchange, though we get to scale which simply isn’t possible with human bottlenecks.
The customer services examples somewhat fall into this category as well, but I like to give another one: The founding story of the company Automattic, the for-profit entity behind WordPress powering around 30% of the internet websites goes something like this: Matt Mullenweg, kept on writing anti-spam systems with an ever decreasing half-life period. Turns out spammers got him faster and faster.
Turns out, Matt was a human bottleneck. It’s apparently where he realized that he needed to replace this human bottleneck with a machine. Thus the plugin Akismet and the company Automattic were born.
Machine Learning Opportunity Breakers in Detail
Let’s look at things that break opportunities we so far identified, discard the bad apples from our apple basket.
(1) do you need high accuracy?
High Accuracy is achievable with machine learning. And in some cases, machines are even more accurate than human experts. But there’s a caveat: high accuracy is insanely more expensive than good accuracy which is insanely more expensive than average accuracy.
The good news is, in most cases, you actually don’t need high accuracy. The cost of sending out an email twice to a few customers is usually quite low. As is displaying the wrong ads (usually).
But sorting incoming money to the right account is something that actually should be highly accurate!
So ask yourself, “Is it ok if this thing goes wrong 10–20% of the time?”. If not, then it’s probably not worth it. If it is, you can build the first iteration with 80% accuracy and still increase it later on, with a lot more cost for you.
(2) Do simple if-then do the job?
Every company wants to build a large and expensive recommendation engine. And yet, as a product manager you should always first evaluate to much simpler “if-then” type of alternatives.
The first is hand-made recommendations. If your customer base is quite homogeneous, And only buy the top 10 bought products, well then maybe a hand-written “top sellers” list displayed under every article and on the main page will do the job just fine.
On the other hand, if hand-made doesn’t compare well on a value/cost scale, then you can still think about doing a non-machine learning product like “top sellers in this category” or “top sellers of the day”. Every dev team can build such a feature, no need to involve specialized machine learning engineers.
Only if you feel that these two things really won’t do it, you should look into the machine learning recommendation engine. The bonus is, once you get to actually build a true recommendation engine, you get a baseline to compare to and lots of data on click-throughs, etc.
(3) Is there only a small amount of data?
Despite what people think, and despite that there are of course machine learning techniques for small data sets, from a product manager’s perspective you’ve got to go with the obvious things, the things that don’t add a large uncertainty or cost to the project.
So if your opportunity does not generate data or only a very small set of data, then look somewhere else. Besides, opportunities that generate little data usually need “something else” first. If you are interested in instance to help sort actual physical mail via machine learning, by OCRing it and then sorting it, there are only two possible reasons for little data.
One is that whatever you’re targeting is not in use that often, in that case, the business value of doing any machine learning there is probably low. If there simply is little mail that is currently OCRed, and people do not think about sorting it afterward digitally, it’s probably because it’s not needed.
Or that it’s in use, but does not generate data. If there are tons of physical mail, but no one has set up the OCR process yet, again, the first step should be to set up the OCR process and enable people to sort stuff digitally and check whether that already yields improvements.
Now it’s your turn to dig through all the opportunities and projects you have lying around. Are there any apples? If so, sure it’s not a bad apple?
Hope it helps!