A simple Matrix to determine the Costs of Your Next Machine Learning Product on Your Roadmap.

Motivation
AI and machine learning products are on the roadmap of almost any company today, and the trends are rising. But few companies actually have ever implemented more than a dozen successful products. According to a recent study, 65% of companies with investments in AI have not seen any business returns on them. In the same study, high costs are reported as a major hurdle. So let’s see how we can go about lowering the costs of AI products and develop faster thanks to a simple matrix.
Let’s take a look at your company. Ever wondered how expensive it would be to build a recommendation engine for your business? Or a forecaster for your traffic load? Or a classification system for your customer service tickets?

Today those kinds of products all involve machine learning. All of them are built by machine learning engineers or data scientists. And all of them can be of great use. However estimating the costs, especially if your company only has implemented fewer than a dozen such products might be pretty overwhelming.
The Matrix of ML products
In a presentation given by Ville Tuulos, I stumbled upon four classes that characterize the four main cost categories that today’s machine learning solutions fall into. In my eyes, to understand the cost of a product like a recommendation engine, you simply have to answer two questions:
- Question 1 (batch or real-time): Does this product need to process new data close to real-time, or is it ok to have it in a batch/ few time process?
- Question 2 (cost of delay): What if this product breaks, and cannot be used? Is that delay ok? Or is serious money lost?

For every quadrant, you might be able to think of a typical product that fits into that quadrant. If not, no worries let’s explore all of those quadrants in detail, and what cost is associated with each. This seems pretty obvious, so let’s explore the main cost drivers, business values, some examples and then how you can actually lower the cost of the higher demanding products.
Quadrant one, “real-time” & “serious money lost”

These products take in data all the time and infer all the time. If they break, serious money is lost, workflows are blocked, decisions cannot be made. A typical example is the NetFlix personalization engine. It’s what provides you with the view above and of course many other aspects of what makes NetFlix such a pleasure to use.
Every second it runs, it increases NetFlix’s most important metric, the number of new videos started each second, by some percentage points. Indeed something like 70% of all videos watched on NetFlix is due to some form of recommendation. But as a result, every second this system is not working in real-time, it actually costs serious money. Every second it has to ingest what you just watched and update its inference process.
Another example would be price forecasting for automobile buyers & resellers like the german “wirkaufendeinauto.de”.

Prices in the automobile markets are very volatile and change on a daily basis. Thus if their pricing system is not forecasting properly, they indeed lose money.
Quadrant one, business value & cost
Typically the business value of such a machine learning product is calculated via the usual product manager magic:
“CTR x conversion x value of conversion”.
Cost: Real-time inference & updates are tough. It typically has two main cost drivers on such products:
- Making the service, some dockerized API, highly available. That’s mostly the software engineering side.
- Making the inference process real-time as well as the updates based on the data. That’s the machine learning side of it.
Cost guess: Those solutions need small to large teams to work on them constantly.
Quadrant two, “batch” & “serious money lost”
Let’s consider the case which is, of course, less expensive, that is if we don’t have the real-time constraints but can go to a batch model.
Typical Example: A customer e-mail classification & sorting system. Such systems take the 1,000s of e-mails that are sent for instance to “info@wordpress.com” and sort them into “queues” which are then worked on by more specialized customer personnel, depending on the topic. If something is not sorted right, then a human has to sort it. Additionally, context switch costs usually occur.
Other products in this quadrant are customer classification for marketing, churn prediction and the likes. For all those products, batching sorting classifying and predicting in regular intervals like an hour, using new data for an update once a week or a month is usually fine.
Quadrant two, business value & cost
Again for the business value, you’ll go about this with a formula like “likelihood of retrieving a churning customer x lifetime value” or “sorting time saved x value” and the likes.
Cost:
- The SLAs on the inference.
- The development costs, which are typically split across a relatively low number of iterations.
Cost guess: work of a team, a few weeks to some months. Constant work not required, the tuning process will take a couple of iterations, but no more.
Quadrant three, “real-time”, “cost is low”
Typical Example: Those products typically are meant for decision-makers to look at. They are not in a critical process, but they have a very big impact if a decision is made using the information. As such, cash flow forecasts, analysis of “main drivers of customer views” or “churn prediction” on an aggregate level are all good examples.
Quadrant three, business value & cost
The business value is typically in the decision that can be made. Those decisions are typically hard to price, so the goal here is to iterate very closely with an exec to get him to use this often and reliably.
Cost drivers:
- The development time, you’ll need a couple of iterations to get this to work such that it is indeed used often. And in contrast to the other quadrants, this is actually a large part of the process, actually getting a prototype out of the door.
- Next to no cost on the SLAs, as it doesn’t matter whether decision-makers look at those numbers today or tomorrow.
Cost guess: one person a few weeks.
Quadrant four, “batch” & “cost is low”
Typical Example: The data science unicorn. That’s a prototype a jupyter notebook, an analysis with some training and evaluation. Maybe even an excel list of results. It’s what a data scientist usually produces before he tackles the large product. But even those one-time static results can be deployed into a full product.
Quadrant four, business value & cost
Business Value: It’s either pure prototyping, “can this work?”; Or it’s somewhat of a one-time run on something we don’t think will change in the future like “let’s run a guess on the main factors of churn in 2020”.
Cost guess: 1- a few weeks, one to multiple data scientists.
So our complete matrix looks like this:

. The chain of costs are like this:
Unicorn < real time & cost high < batch & serious money < real-time heavy load thing.
But How Does This Help Me Make My Machine Learning Product Cheaper?
The magic is for almost any product you currently have in mind, you can probably choose a cheaper one, with roughly the same business value on the first iteration. Not in later iterations of course, but that’s the main point. You can always use the first iteration to get feedback and a better estimate of the business value.
The real-time, high-cost product: An example would be the mentioned sales algorithm of “wirkaufendeinauto.de”. The idea is, that when you take your car to a “wirkaufendeinauto.de” shop, they need an accurate estimate of what this car would sell for tomorrow. So what could they do instead of updating that every minute based on price data?
Let’s turn this into a batch product: Of course, they could start batching just every night or so. That means intraday price changes are not used. But still, the estimates would be pretty good. To mitigate these effects, and to make the solution more robust, they could simply apply a 1–2% discount on all prices to account for this fact.
Let’s further turn this into a batch with no serious costs on loss: So this solution is still pretty expensive, we could turn it into a cheaper one by using it differently. How would that work? You would employ actual people who do the price estimates (by looking at a bunch of websites e.g.) and let them use the “projected prices” supplied by your solution that is updated every day. This way, if the machine learning solution breaks, people can still price cars roughly right, and no serious money is lost.
Let’s finally see with what we could start with as a unicorn: Of course, we would start this with a first very rough guess and give that to the people doing the price estimations. They could use the “projected price on 2020.01.01” and if they find it useful, you will go on to develop the batch process.
Using the same process you can turn almost any full-fledged idea into a unicorn.
Now it’s your turn to build your machine learning products!