Understanding decentralization will help you understand, evaluate & adapt to current technology trends.
“Decentralization is based on the simple notion that it is easier to macrobull***t than microbull***t. Decentralization reduces large structural asymmetries.” ― Nassim Nicholas Taleb, Skin in the Game: Hidden Asymmetries in Daily Life
The human body is an amazing system. Optimized over a couple of million years, it seems to work pretty well. Scholar and statistician Nassim Taleb is a huge fan of nature, and as I recently reread some of his work, what stuck with me is the level of decentralization that nature built into our bodies.
We have two kidneys and if one fails, things will be fine. I tore one of the ligaments in my feet and luckily nature provided me with three so I can go on as if nothing happened.
It also occurred to me that most of the revolutions happening in technology companies, microservices, data meshes, or micro frontends really are all coming down to this one fundamental idea that nature has been applying for millions of years: Decentralization.
What is that exactly? I found this quote which I like:
“A strict definition of decentralized systems could be that each system in the structure must fulfill specified demands on interaction with other systems, but it should be possible to develop (and change) the inner structure in each system, including data storage, without dependencies to other systems, as long as the specified interaction stands. It must for instance be possible to insert systems of different origins into the structure. The main condition is that each system must interact with other systems as specified.” – Mats-Åke Hugoson, “Centralized versus Decentralized Information Systems A Historical Flashback”
Computer science as a discipline is pretty new, only a couple of decades old. So it keeps on reinventing itself, which makes it hard to follow on the current best practices. It’s not easy to understand how to adopt a new idea to one own “system” or company. In the cases of trends like microservices, data meshes, or micro frontends, however, I think we can simply look at the timeless principles which have been underlying the idea of decentralization in nature all along.
The five essential principles
There are five essential principles which can be grouped into two categories:
There are drivers of the robustness of decentralized systems over centralized ones:
- Reduced interconnectedness
There are drivers of the cost of decentralized systems over centralized ones:
- Gluing things back together
As we are talking about man-made systems here, it’s up to you to choose any level of redundancy, interconnectedness, or diversification for your technology organization. It’s up to you to influence these drivers for certain subareas like data production, which then could result in a “data mesh.” And it is up to you, to decide how your data mesh, your approach to microservices, or your micro frontend approach should look like, adapted to your company’s unique relation to these five principles.
Here, a closer look at these five principles.
The first driver of robustness is redundancy
The idea is simple, if you have at least two parts in a system, they are redundant if one can fail without crashing the whole system. The ligaments in my feet are functionally mostly redundant, one snapped, but I’m still able to do sports just as before. My arms are redundant, but not really completely functionally redundant. With one, I cannot do what I can do with two, still, it is redundancy to have two.
The company Spotify implements its application in a redundant way. They embrace the concept of micro frontends, although as far as I can tell they do not call it that. But it means, that every team essentially has one small “tile” in a large window which makes up their application. So if the team owning the “Discovery Weekly” part has a system crash, you can still order & listen to your music and will barely notice. If the search crashes, you even have a mostly functionally redundant feature with the browsing function.
The company Spotify thus decided to turn up the screw on redundancy to the max they could imagine.
The second driver of robustness is minimizing interconnectedness
Your two kidneys do not share the same “incoming door”, there are two separate branches of the renal artery, which come from the central descending aorta. Thus the interconnectedness is minimized. If one renal artery is blocked, the other will simply pick up the flow.
The same idea is played out in the concept of microservices. Microservices are an architectural pattern that basically tries to break things down, decouple them, and expose some kind of interface for them, a clear contract as the only way “in & out.” It’s just like our two kidneys which of course both have their own ureter, just as each microservice has his own point of data storage. That way, both data storage or the ureter can differ considerably, depending on incoming & outcoming “stuff.” If either the data storage or the ureter fails, the whole system will still be intact because of minimized interconnectedness.
Amazons API mandate
Following the so-called “API Mandate” from founder & CEO Jeff Bezos, the company Amazon transformed it’s IT landscape to one focused on minimizing interconnectedness. In fact, the original “API Mandate” makes it pretty clear, that the only way in and out of a team’s internal stuff is through an interface that must be designed such that it could even be used by someone outside the company.
This for instance led to a separation of the data storage of different teams, which in turn leads to much more robustness in case of local breakdowns. But it also led teams to adapt a redundant and fault-tolerant communication practice. Besides increasing the robustness of the system, it also increased the flexibility, without dependencies, any team is free to choose the complete technology stack, deploying different programming languages, choose whatever gets the job done best.
The third driver of robustness is diversification
Your immune system is different from your usual organs. It’s dispersed throughout your body and is composed of a variety of different proteins and cells which are meant to react to very different kinds of foreign material.
The little units in our immune system are diversified, they belong to the same subsystem of the system called the human body, all have the same mission, to keep the body healthy, but are all very different. And since our immune system has to respond to a lot of unknowns, as well as changes throughout time, diversification is the only way to keep it working.
Decentralized systems can have very different degrees of diversification, allowing for different robustness & autonomy levels. The concept of a “data mesh” as used at Netflix provides a rather large diversification level. The concept of a data mesh mostly means, that a team that produces data, really owns it in the sense that it is also responsible for serving it to potential end-users.
Netflix data architecture
But Netflix does not put a tight bound on how teams are supposed to serve their data. Instead, they allow teams to serve data in any of the standard technologies at use at Netflix. Teams can choose AWS S3, Redshift, RDS instances, or druid to provide data to the end-users. Thus this diversification provides for much better adaptability to both individual end-user needs as well as to changes in the environment like new kinds of end-users.
While these three things drive robustness and autonomy throughout companies, they also have a cost.
The first cost driver is complexity
Decentralization means with every step we decentralize, we also add complexity. Complexity because previously where there was one, there now are possibly two. The degree might differ, depending on the degree & kind of decentralization, but the trend is always the same. The micro frontends at Spotify are great, they allow for the flexibility that every team individually can work on their system. They can launch changes individually and even crash individually. But now what previously was one subsystem, the application, just became 10+ subsystems.
Spotify’s complexity cost
That means the part that is not decentralized suddenly has to deal with this complexity. And of course, there are such parts, otherwise, the individual tiles would be owned by separate companies, not teams. It means for instance if a developer wants to change teams at Spotify, he has to relearn the way of building his “tile.” It means, if someone from product management wants to get a picture of the “roadmaps” of the complete app, he now has to take a look at ten different roadmaps, or someone has to synchronize them. It means if a central party wants to fix security but in the database technology used across Spotify, they have to do that 10 times, not just once. So there the mere existence of multiple decoupled teams creates complexity costs down the road.
The second cost driver is not about the individual teams, the tiles, but about the glue that holds the window together.
The second cost driver is the need to glue things back together
Mushrooms have many bodies, fruit bodies, connected by a network called “mycelium.” But for humans, nature decided to not tie our bodies together. Probably because it would’ve made walking a pretty weird exercise, being glued together with his family.
The point is, complex systems that have autonomous subparts need to be glued back together somehow. They need to be connected, otherwise well, they wouldn’t be one system. But that connectedness also comes at a cost. For mushrooms, it means, although the fruit bodies are completely independent, a new fruit body is located pretty close to the others. It shares all resources with the other “family members.” This is a significant cost for a mushroom, but probably a trade-off well worth it in terms of survival. (check out Jennings & Lyseks “Fungal Biology: Understanding Fingal Lifestyle” for details)
Gluing data costs at Netflix
The same cost driver is out there in the world. Netflix data mesh provides a large diversity, but that diversity also drives up the cost of gluing things back together. An individual data engineer or BI analyst at Netflix who wants to have access to a large variety of data will either need to individually carry that cost or the company Netflix has to shoulder it.
In this case, Netflix chose to shoulder that cost by for instance creating tools like Metacat essentially gluing back together with different data sources like AWS S3, RDS databases, or AWS redshift. If the diversity wouldn’t be there, if there were only one storage technology, there would be no need to provide a large scale solution to glue things together.
When talking about nature & technology systems, it’s easy to forget that we’re actually dealing with humans.
Decentralizing Human Systems
When I wrote the three drivers or robustness of decentralized systems, I felt like I was missing an important part: the human part.
Today we’re not just decentralizing to make systems more robust. We’re usually decentralizing human systems. Thus the aim is on the one side to use redundancy, minimal dependencies, and diversification to increase the robustness. On the other side, we’re also aiming to use autonomy & responsibility to increase the productiveness of the individual units.
That decentralization leads to more autonomy, and autonomy makes people happy and more productive was already understood by Sun Tzu, Peter Drucker, or probably today most famously by Daniel Pink, author of “Drive”.
All the techniques mentioned above, microservices et. al. are techniques changing human systems. All of them award more responsibility & autonomy to a unit. A unit doesn’t have to be a team, it can be an organizational unit, a single person, etc. Microservices put all the responsibility into a team’s hand, especially the operational responsibility after the mantra “You built it, you run it.” Data meshes put the responsibility to deal with end-users of data and serve them properly into the team’s hand. Micro Frontends put the responsibility of building the actual frontend components into the team’s hands.
So decentralization makes sense whenever we want to do either, increase the robustness of our system, or the productivity of our teams.
These principles help me understand how these trends impact me, my system, or the company I work at. They help me understand when having micro frontends is a smart idea, and to what extent a data mesh should be glued back together. They help me see, why certain companies chose a specific application of a microservices other than another. They help me understand why certain companies go for a fully standardized infrastructure stack where others choose a well harmonized “Infrastructure as a Service” approach and let teams do whatever they want.
I hope they help you too.
- Spotify is a very early adopter of the micro frontends framework. When they adopted that methodology it wasn’t called that. But you can get an explanation of it from their engineering culture videos: https://engineering.atspotify.com/2014/03/27/spotify-engineering-culture-part-1/, https://engineering.atspotify.com/2014/09/20/spotify-engineering-culture-part-2/, from back in 2014.
- I don’t think Netflix actually calls what they do a data mesh, but from https://netflixtechblog.com/metacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520 it seems that’s exactly what their building.
- The Immunsystem is nicely explained here: https://primaryimmune.org/immune-system-and-primary-immunodeficiency
- Jeff Bezos “API Mandate” is explained in more detail here: https://api-university.com/blog/the-api-mandate/