There are two ways of making data-driven decisions, analytical and experimental, and only one of them is right for you at any given moment! Cynefin & data-driven decision making. By the author. Motivation We need more forecasting, more predictive analytics, generic before & after analysis frameworks, more AI…. and yet 65% of managers report no visible value from any of those efforts. I think one important component of failed analytics projects is that companies fail to understand when to use analytical data-driven decision making, and when to use experimental data-driven decision making. I claim, companies like Netflix, Airbnb and Zynga are so […]
How to Estimate and Lower the Costs of Machine Learning Products
A simple Matrix to determine the Costs of Your Next Machine Learning Product on Your Roadmap. The deeply (ai) confused product manager. Illustrations by the author. Motivation AI and machine learning products are on the roadmap of almost any company today, and the trends are rising. But few companies actually have ever implemented more than a dozen successful products. According to a recent study, 65% of companies with investments in AI have not seen any business returns on them. In the same study, high costs are reported as a major hurdle. So let’s see how we can go about lowering the costs […]
Democratize Data like Zynga, Facebook and Ebay Do
The three pillars of data democratization & self-service analytics. The data law as seen at Zynga, cheap & easy as seen at eBay, and the data infrastructure as seen at Facebook. (all images by the author) Zynga, Facebook, and eBay have been democratizing their data, making it accessible and easy to use for every person in their companies for years now. Data democratization is the foundation of self-service analytics, so let’s see how you can do this too. All three companies are very open about their process, so we can see, even though they chose three different technical architectures, they all […]
Data Mesh Applied
Moving step-by-step from mono data lake to decentralized 21st-century data mesh. (also check out the follow-up article: three kinds of data meshes) Left: data lakes with central access, on the right: user accessing data from teams domain teams providing a great data product. (all images by the author) How does a 21st-century data landscape look like? Zhamak Deghani from ThoughtWorks gave a beautiful and, for me, surprising answer: It’s decentralized and very different from what we see in almost any company currently. The answer is called a “data mesh”. If you feel the pain of current data architecture in your company, as […]
batect: Build and Test Environments as Code — With a Python Sample Project.
batect: auto downloads itself on first activation. Listing possible tasks to be run. How long does it take you to onboard a new colleague? It takes roughly two weeks to get someone outfitted with all dev, build and test environments. The concept of Build and Test Environments as Code tries to take that trouble away. And the tool batect makes that possible within Docker. In the words of the “go script concept”: “You know you’re on a mature dev team when your instructions as a new team member are: check out the repo, run ./batect — list-tasks, ./batect setup, and you’re done. I’ll show you how […]
Developing Docker Multi-Service Applications Faster Than Ever
ugly sketch: by me Working with docker-compose? And as soon as it involves more than one repository or more than one “service” things get messy. Cage brings some order to that chaos and is a tremendous help in such setups. To understand Cage you should understand the workflow it is made for, and what the faraday.io people believe is the right workflow to have: One repository for the infrastructure of a multi-service stack. For instance ten docker-compose.yml’s for services that are deployed together. You should then be able to download with one command cage pull all ten CI server-built docker images […]
Continuous Delivery For Machine Learning (CD4ML) Example
I created a github repository with lots of explanations and a step by step guide to walk you through a CD4ML Setup with dvc on AWS (S3 as storage) on top of a gitlab CI server. Check it out! Blog post will follow.
Testcontainers in Python – Testing Docker Dependent Python Apps
“Python wrestling with the docker-compose squid”, by the author. The Python package testcontainers solves two problems common to Python-apps. We develop Python-based applications and deploy them using the AWS ECS-CLI. So we directly deploy a docker-compose configuration into AWS ECS. That configuration wants to be tested locally as well, and I haven’t found a proper solution for that other than the package testcontainers. If you don’t work with docker-compose but k8n or some other docker orchestrator, you for sure encounter the second use case. It’s to spin up a local container with Postgres, NGINX or Redis to run a small integration […]
Introduction to PyTorch BigGraph — with Examples
Network Photo by Alina Grubnyak on Unsplash PyTorch BigGraph is a tool to create and handle large graph embeddings for machine learning. Currently there are two approaches in graph-based neural networks: Directly use the graph structure and feed it to a neural network. The graph structure is then preserved at every layer. graphCNNs use that approach, see for instance my post or this paper on that. But most graphs are too large for that. So it’s also reasonable to create a large embedding of the graph. And then use it as features in a traditional neural network. PyTorch BigGraph handles the […]
Using Graph CNNs in Keras
GraphCNNs recently got interesting with some easy to use keras implementations. The basic idea of a graph based neural network is that not all data comes in traditional table form. Instead some data comes in well, graph form. Other relevant forms are spherical data or any other type of manifold considered in geometric deep learning. So what does graph data look like if not like a table? Here’s an example: Let’s put some meaning into those variables, and no I’m not gonna use a “citation network” example which would be the default for graph based neural networks. While easy to […]