Data as Code – Cutting Things Smaller

Domain-Driven Design and microservices have changed the way software engineers work. And I think, they can be used to multiply the productivity of data teams as well. But I also think, they have to be used slightly differently because the common centralized data team is in a very different situation than the common (decentralized) software dev team. This article is about exploring this very different way of working with your central data system. It offers a simple iterative way, just like microservices, to start with the default monolithic one pot to capture it all. It then lets you slowly break […]

Data as Code – Cutting Things Smaller Read More

The Three Biggest Challenges Of Data and Maybe of the Business World

This was originally a “Thoughtful Friday” reserved for newsletter subscribers. If you like the content below, I suggest you subscribe as well! In my head, the three biggest challenges for the world of data are very clear. They are so clear because I believe in two fundamental ideas about our world and its relation to data: 1) Every company —- will turn into —-> a software company —- will turn into —> a data company – having data at the heart of its business strategy. 2) Data will continue to grow exponentially Mainly because I simply believe them, but of […]

The Three Biggest Challenges Of Data and Maybe of the Business World Read More

3 Questions You Should Answer Before Introducing a Data Mesh

“However, new clients unaccustomed to his methods said that Drucker’s consulting was frustrating at first.[…]. Drucker on the other hand asked many questions but refused to answer ones about what to do, saying that this was not his job. Yet the questions he asked his clients led them to come up with solutions, which repeatedly led to success, and they usually hired him again.” (Taken from Peter Druckers very different consulting model, but “Peter Drucker on Consulting” tells the same story) Peter Drucker had a consulting model which I admire because it had three components I value.  One, it forced […]

3 Questions You Should Answer Before Introducing a Data Mesh Read More

Why the Data Mesh Sucks

… you into three fallacies, the need to build any platform at all, the need to build a data mesh, and lots of coupling inside the platform. (based on Photo by Thomas Tucker on Unsplash)  “The most valuable resource is no longer oil, but data.” to put it with the words of The Economist.  To extract that value from data is the new frontier for companies of this century. Data meshes appeared in 2019 to change efforts around data fundamentally. In my words, data meshes are pretty simple (but not easy). “An organization has a data mesh, if it has […]

Why the Data Mesh Sucks Read More

How To Build the Next Mega Open Source Project

8 lessons learned from a 3-dimensional framework for understanding how to turn your open-source project into the next WordPress or Linux. (Image by Sven Balnojan, on basis of the photo by Markus Spiske on Unsplash) “Open Source projects exhibit natural increasing returns to scale. That’s because most developers are interested in using and participating in the largest projects, and the projects with the most developers are more likely to quickly fix bugs, add features and work reliably across the largest number of platforms. So, tracking the projects with the highest developer velocity can help illuminate promising areas in which to get […]

How To Build the Next Mega Open Source Project Read More

COSS: 7 Models to Develop & Price Open-Core Products

How a commercial open-source software company should develop & price open-core products to fight the hyper-cloud “service-wrappers”. (Photo by Tim Mossholder on Unsplash; Are you still open for contributions?) Dbt Labs, formerly Fishtown Analytics, recently did a large Series C. In the announcement blog post, Tristan Handy, CEO of dbt Labs outlined the major risk he currently sees for open-core products like the one dbt Labs sells: commoditization by the hyper-clouds. Turns out, Sid Sijbrandji, CEO of GitLab, also a company selling an open-core product, thinks very much alike. He has a thorough analysis of how an open-core product can […]

COSS: 7 Models to Develop & Price Open-Core Products Read More

How to Become The Next 30 Billion $$$ Data Company

14 thoughts on the economics of the open-source data space and how to become the next MongoDB or Databricks Image by the author. The data space is booming, with companies like mongoDB (valued at 18 billion USD), databricks (30 billion), or Confluent, and many others. The startup space is overflowing with money and lots of founders want a share of the pie. But in my opinion, the data space is set up to be dominated by open source solutions in the near future. Open source spaces have a very clear winner takes most dynamic making them extremely hard to compete. And […]

How to Become The Next 30 Billion $$$ Data Company Read More

Data as Code — Principles, What it is and Why Now?

No, DaC is not just versioning data! It’s applying the whole software engineering toolchain to data. For that, we need principles. This post is part of a small series beginning with: Data as Code — Achieving Zero Production Defects for Analytics Datasets. Image by Sven Balnojan. Data as Code is a simple concept. Just like Infrastructure as Code. It just says “Treat your data as code”. And yet, after IaC appeared on the ThoughtWorks Radar in 2011, it still took roughly 10 years to “settle in” and is still on an uneasy spot where IaC advocates feel they need to remind people […]

Data as Code — Principles, What it is and Why Now? Read More

How Conways Corollary Wrecks Your Data Organization

Opinion Conway’s law has an evil corollary that goes unnoticed in the dev world but wrecks your data org. “You think it’s a hack, but all you’re hacking apart is the value of your data.” (the author) Image by Sven Balnojan. Melvin Conway, a brilliant computer scientist who also invented the notion of a coroutine, has become pretty famous in the last 20 years for a law named after him: “Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.” Turns out this is very important as we move towards […]

How Conways Corollary Wrecks Your Data Organization Read More

Data as Code — Achieving Zero Production Defects for Analytics Datasets

Notes from Industry How to apply the true Data as Code philosophy to achieve close to zero production defects using the tried & true methods from software development on data. Yep, zero defects! That’d be awesome. Image by the author. Data teams spent close to 60% of their time on operational things, not producing value. They also experience a large level of bugs in their data systems, according to the datakitchen study & Gardeners survey. Yet, in the software development world, we already have the philosophies in place that allow high-performing teams to deliver both quickly and at a high level of quality, […]

Data as Code — Achieving Zero Production Defects for Analytics Datasets Read More