Author: Saptarshi Sengupta
Despite recent and evolving technological advances, the vast amounts of data that exists in a typical enterprise is not always available to all stakeholders when they need it. In modern enterprises, there are broad sets of users, with varying levels of skill sets, who strive to make data-driven decisions daily but struggle to gain access to the data needed in a timely manner.
True democratization of data for users is more than providing data at their fingertips through a set of applications. It also involves better collaboration among peers and stakeholders for data sharing and data recommendation, metadata activation for better data search and discovery, and providing the right kind of data access to the right set of individuals. Deploying an enterprise-wide data infrastructure with legacy technologies such as ETL, is costly, slow to deploy, resource intensive, and lacks the ability to provide data access in real-time. Worse, constant replication of data puts companies at risk of very costly compliance issues related to sensitive and private data such as personally identifiable information (PII).
As enterprise data becomes more distributed across cloud and on-premises global locations, achieving seamless real-time data access for business users is becoming a nightmare. Modern integration styles like logical data fabric architecture are provisioning data virtualization to help organizations realize the promise of seamless access to data, enabling democratization of the data landscape. When organizations adopt a logical data fabric architecture, they create an environment in which data access and data sharing is faster and easier to achieve, as business users can access data with minimal IT involvement. If properly constructed, logical data fabrics also provide the necessary security and data governance in a centralized fashion.
Critical capabilities and characteristics of a logical data fabric include:
1. Augmentation of information and better collaboration using active metadata – Data marketplaces are important for users to find what they need in a self-service manner. Because a logical data fabric is built on a foundation of data virtualization, access to all kinds of metadata and activation of metadata-based machine learning is easier to build and deploy compared to a physical data fabric. In a single platform logical data fabric, the data catalog is tightly integrated with the underlying data delivery layer which helps a broad set of users achieve fast data discovery and exploration.
Business stewards can create a catalog of business views based on metadata, classify them according to business categories, and assign them tags for easy access. With enhanced collaboration features, a logical data fabric can also help users to endorse datasets or register comments or warnings about them. This helps all users to contextualize dataset usage and better understand how their peers experience them.
2. Seamless data integration in a hybrid or multi-cloud environment – These days organizations have data spread across multiple clouds and on-premises data centers. Unlike physical data fabrics that are unable to synchronize two or more systems in real time, logical data fabric provides business users and analysts with an enterprise-wide view of data without needing to replicate it.
Logical data fabrics access the data from multiple systems, that are spread across multiple clouds and on-premises locations, and integrate the data in real-time in a way that is transparent to the user. Also, in cases where a logical data fabric spans various clouds, on-premise data centers and geographic locations, it is much easier to achieve semantic consistency so that individuals, at any location, can use their preferred BI tool to query data.
3. Broader and better support for advanced analytics and data science use cases – Data scientists and advanced analytics teams often view data lakes as their playground. The latest trend around data lakehouse is to make sure IT teams can support their BI analysts or line of business users as well as data scientists with a single data repository deployment. But there are some inherent limitations to lake houses. Most notably, it requires a lot of data replication, involves exorbitant egress charges to pull data out of lakehouses, and it is impractical to assume one physical data lakehouse can hold the entire enterprise-wide data and the list goes on.
Because a logical data fabric enables seamless access to a wide variety of data sources and seamless connectivity to consuming applications, data scientists can work with a variety of models and tools, allowing each to work with the ones they are most familiar with. A logical data fabric enables data scientists to work with quick iterations of data models and fine tune them to better support their efforts. It also allows them to focus less on the data collection, preparation, and transformation because this, too, can be handled by the logical data fabric itself.
While these are some of the most important considerations for deploying a logical data fabric, there are other compelling reasons. For example, physical data fabrics cannot handle real-time integration of streaming data with data-at-rest, for data consumers. As it relates to data security, governance, and compliance, physical data fabric can make enterprise data prone to non-compliance with respect to rules such as GDPR or UK Data Protection Act, for instance. Data security rules cannot be centralized in case of a physical data fabric, forcing IT teams to rewrite data security rules at each application and data source level.
With all these considerations in mind, many Fortune 500 and Fortune 1000 companies are deploying logical data fabric with data virtualization to make data available and self-serviceable for all their data consumers and data stakeholders. Only with a logical data fabric can help any organization truly democratize their data and empower all their globally distributed data consumers.