Moving step-by-step from mono data lake to decentralized 21st-century data mesh. (also check out the follow-up article: three kinds of data meshes) Left: data lakes with central access, on the right: user accessing data from teams domain teams providing a great data product. (all images by the author) How does a 21st-century data landscape look like? Zhamak Deghani from ThoughtWorks gave a beautiful and, for me, surprising answer: It’s decentralized and very different from what we see in almost any company currently. The answer is called a “data mesh”. If you feel the pain of current data architecture in your company, as […]
batect: Build and Test Environments as Code — With a Python Sample Project.
batect: auto downloads itself on first activation. Listing possible tasks to be run. How long does it take you to onboard a new colleague? It takes roughly two weeks to get someone outfitted with all dev, build and test environments. The concept of Build and Test Environments as Code tries to take that trouble away. And the tool batect makes that possible within Docker. In the words of the “go script concept”: “You know you’re on a mature dev team when your instructions as a new team member are: check out the repo, run ./batect — list-tasks, ./batect setup, and you’re done. I’ll show you how […]
Developing Docker Multi-Service Applications Faster Than Ever
ugly sketch: by me Working with docker-compose? And as soon as it involves more than one repository or more than one “service” things get messy. Cage brings some order to that chaos and is a tremendous help in such setups. To understand Cage you should understand the workflow it is made for, and what the faraday.io people believe is the right workflow to have: One repository for the infrastructure of a multi-service stack. For instance ten docker-compose.yml’s for services that are deployed together. You should then be able to download with one command cage pull all ten CI server-built docker images […]
Continuous Delivery For Machine Learning (CD4ML) Example
I created a github repository with lots of explanations and a step by step guide to walk you through a CD4ML Setup with dvc on AWS (S3 as storage) on top of a gitlab CI server. Check it out! Blog post will follow.
Testcontainers in Python – Testing Docker Dependent Python Apps
“Python wrestling with the docker-compose squid”, by the author. The Python package testcontainers solves two problems common to Python-apps. We develop Python-based applications and deploy them using the AWS ECS-CLI. So we directly deploy a docker-compose configuration into AWS ECS. That configuration wants to be tested locally as well, and I haven’t found a proper solution for that other than the package testcontainers. If you don’t work with docker-compose but k8n or some other docker orchestrator, you for sure encounter the second use case. It’s to spin up a local container with Postgres, NGINX or Redis to run a small integration […]
Introduction to PyTorch BigGraph — with Examples
Network Photo by Alina Grubnyak on Unsplash PyTorch BigGraph is a tool to create and handle large graph embeddings for machine learning. Currently there are two approaches in graph-based neural networks: Directly use the graph structure and feed it to a neural network. The graph structure is then preserved at every layer. graphCNNs use that approach, see for instance my post or this paper on that. But most graphs are too large for that. So it’s also reasonable to create a large embedding of the graph. And then use it as features in a traditional neural network. PyTorch BigGraph handles the […]
Using Graph CNNs in Keras
GraphCNNs recently got interesting with some easy to use keras implementations. The basic idea of a graph based neural network is that not all data comes in traditional table form. Instead some data comes in well, graph form. Other relevant forms are spherical data or any other type of manifold considered in geometric deep learning. So what does graph data look like if not like a table? Here’s an example: Let’s put some meaning into those variables, and no I’m not gonna use a “citation network” example which would be the default for graph based neural networks. While easy to […]
Using S3 Just Like a Local File System in Python
“S3 just like a local drive, in Python” There’s a cool Python module called s3fs which can “mount” S3, so you can use POSIX operations to files. Why would you care about POSIX operations at all? Because python also implements them. So if you happen to currently run a python app an write things to a local file via: with open(path, “w”) as f: write_to(f) you can write this to S3 simply by replacing it by: with s3.open(bucket + path, “w”) as f: write_to(f) . Of course S3 has good python integration with boto3, so why care to wrap a POSIX […]