Guest post originally published on Mia-Platform’s blog
Data is one of the most critical components of any business, as it allows us to personalize and customize our products for potential consumers. As important as data is, studies have shown that about 50‑70% of data collected by organizations goes unused and becomes what Gartner calls Dark Data. We can attribute this large amount of unused data to the inefficiencies in the systems that manage them.
This post discusses how methods like Data Meshes and Data Fabrics, which have emerged in the past decade, can help mitigate the problems associated with data management.
At the end of this post, you should understand what Data Meshes and Data Fabrics are, their differences, and why one may overtake the other.
What is a Data Mesh?
According to IBM, a Data Mesh is a decentralized data architecture that organizes data by a specific business domain, providing more ownership to the producers of a given dataset. By decentralizing data, a Data Mesh offers an alternative to the central data lake and team culture that has been present in companies for decades. It is important to note that Data Meshes are language‑agnostic and technology‑agnostic as it is an approach that focuses more on organizational changes.
Principles of a Data Mesh
Data Meshes are built on four fundamental principles, which are discussed in the paragraphs below:
- The domain ownership principle: domain ownership enables the decentralization of data, meaning that domains that need a particular brand of data are tasked with gathering, cleaning, and managing the ingestion of that data. This principle requires domains to take responsibility for their data.
- The data as a product principle: this Data Mesh principle explains that there are consumers for the data besides the domain responsible for the data. This principle requires that the data be considered and taken care of as an actual product.
- The self‑serve data infrastructure platform principle: this principle requires that the technical complexities in the infrastructure for creating data should be abstracted. This abstraction is done because is complex and challenging to replicate the infrastructure required to build, execute, and monitor a data product in each domain that needs the data. This principle allows data consumers in other domains to focus on using the data instead of recreating the infrastructure.
- The federated governance principle: this Data Mesh principle enables data standardization across the organization. As multiple domains can use the data produced by one domain, an organization must standardize formatting, governance, and other data features to enable collaboration and understanding.
What is a Data Fabric?
As defined by IBM, a Data Fabric is an architecture that facilitates the end‑to‑end integration of various data pipelines and cloud environments through intelligent and automated systems. It is adaptive, flexible, secure, and ensures a consistent user experience across all integrated environments.
With Data Fabric, we can monitor and manage our data applications regardless of where they live.
At the center of the Data Fabric is rich metadata that enables automation, which is designed to automate data integration, engineering, and governance between data providers and consumers.
Responsibilities of a Data Fabric
Alongside automation, the Data Fabric is tasked with the following responsibilities.
- Accessing the data
The Data Fabric architecture is tasked with aggregating data from various sources. It is important to note that the Data Fabric supplies a virtualization layer that allows us to collect the data without copying or moving it.
Coupled with the virtualization layer, a Data Fabric architecture should boast robust data integration and extract, transform, and load (ETL) tools to move the data when necessary.
- Managing the lifecycle of the data
After collecting data from different sources, the Data Fabric ensures privacy and data compliance with regulations.
- Governance and Privacy
Data Fabrics ensure that the right people assess the correct data. Data Fabrics use active metadata to automate policy enforcement to achieve this level of privacy.
These Data Fabric policies govern that certain aspects of data should be masked and accessed on a role‑based method. The Data Fabric policy also requires that we provide rich lineage information for the data, which means the data source, transformations made to the data, etc., should all be provided. Rich lineage information helps us fact‑check the data and optimize for quality.
The Data Fabric architecture ensures that the data complies with the regulations set by governing organizations like the General Data Protection Regulation (GDPR), Fair Credit Reporting Act, etc.
- Governance and Privacy
- Exposing data
Next, the Data Fabric is tasked with exposing the data to different data consumers through other enterprise search catalogs.
What are the differences between Data Meshes and data and Data Fabrics?
As both data paradigms are created to aid data gathering, governance, and distribution, it is easy to notice similarities between them. However, the differences are also apparent and should be considered before an organization chooses a paradigm.
This section discusses the differences between the Data Mesh and Data Fabric paradigms.
- Decentralized vs. centralized data storage
In a Data Mesh, data is distributed in domains, with no single necessary control point.
In a Data Fabric, data access is centralized with high‑speed servers for network and high‑performance resource sharing.
- Automation vs. human Inclusion approach
Data Meshes treat data as a product and rely on domain owners to drive the requirements for the data product.
Data Fabric relies on automation for discovering, governing, suggesting, and delivering data to data consumers. This automation is based on a rich metadata foundation.
The Data Mesh paradigm is language- and technology‑agnostic and focuses more on organizational changes. The Data Mesh architecture follows a domain‑driven design and product thinking to overcome data challenges.
Data Fabric is a more technical data integration solution. The Data Fabric architecture is also more compatible with technical, business, and operational data.
Which paradigm to choose?
It is conceivable to see Data Fabrics take the lead in the coming years regarding efficient data management. Data Fabrics connect the entire organization’s data and facilitate frictionless data sharing.
Because Data Fabrics center on automation, we can optimize data management and send real‑time insights and analytics to data users. Moreover, Data Fabrics offer increased security: the virtualization layer ensures that the data is not unnecessarily moved. Data Fabrics are also cost‑efficient.
However, Data Meshes and Fabrics are not mutually exclusive. Data Fabrics can enable Data Mesh implementation by automating repetitive tasks using Data Fabrics’ metadata insights. With a Data Fabric, data owners in the Data Mesh paradigm can achieve the capabilities to create data products.
This article discusses the Data Mesh and Fabric paradigms, their differences, and, more importantly, what data management method is expected to take the lead in the coming years.
Mia‑Platform Fast Data is a perfect example of the cohabitation between the paradigms, and it can help you shift from one to the other if needed. To understand more about Mia‑Platform Fast Data, check out this article.