Why is putting data to work such a challenge at so many companies? For one thing, companies have ever-increasing mounds of data. Typically, all that data lives in siloed databases or applications that don’t connect. The data is usually stored in different formats, and some of it is structured, while the rest is unstructured. Employees have access to some databases, but not others. Database talent, necessary to wrangle all that data, is finite and expensive. And security and compliance worries make data access and data governance complex.
Multiple technology approaches have emerged to help companies handle these and related data integration problems, including data warehouse, data lake, data mesh, data virtualization, and most recently, data fabric. In this article, we’ll examine two opposite technologies—data warehouse and data virtualization—and discuss what their significant differences mean for how you’ll be able to manage and tap into the value of data.
[ Want to learn more about how to solve your data silo problems and speed up innovation? Get the eBook: The Data Fabric Advantage. ]
What does data virtualization mean? Picture your data in all the different data source systems it lives in, in all its different formats. Data virtualization is a virtualized architecture layer that “sits” on top of those data sources and connects them. (Note: This is distinct from “data visualization,” which refers to things like charts and graphs that help explain data.)
You can think of this virtualized layer as an abstraction layer, which means that all the development work that’d typically be needed to get data isn’t required (like API calls, data pipelines, etc). Real-time updates ensure the data is correct in both the source system and the virtualized layer.
Data virtualization is one aspect of data fabric, which is an architecture layer and toolset for connecting disparate data sets to create a unified view. Due to the virtualized data layer, you don’t need to migrate data from where it lives, say in a database, ERP, or CRM application. The data may be either on-premises or in a cloud service.
You’ll see the terms data virtualization and data fabric used interchangeably sometimes, but think of data fabric as being a bit broader (and more focused on making data usable). That data sitting in the virtualized layer has to be put into action somehow, and data fabric provides the tools to make that possible, so that you can connect, relate, and extend it.
A key point to remember about the data fabric or data virtualization approach is this: the data never actually moves. There’s no migration time or expense. Though the data remains in its source location, you can use it for analysis or to feed other applications. This is a significant difference compared to the data warehouse approach.
Whereas a data fabric connects data sets, a data warehouse only collects them. A data warehouse is a repository for structured data. With a data warehouse, you’re extracting data from the source systems, transforming it to clean it up and duplicate it, and loading it into the data warehouse. That means added operational overhead in terms of added development time, maintenance work, over time, upkeep, and technical debt.
In reality, it takes a lot of time and human effort to get data from point A (or many point A’s) to point B in the warehouse. The data warehouse approach can also cause data integrity problems, since you are moving the original set of data and applying complex transformation logic.
Finally, unlike data fabric, the data warehouse approach commonly forsakes giving users real-time data. (The transformation work would be too hard.) That’s a significant disadvantage.
For more detail on this topic, see our related article: Data fabric vs. data mesh vs. data lake. (A data lake is similar to a data warehouse but for unstructured data.)
These two approaches to data are opposites, but they do have some things in common.
Here’s what the concepts of data virtualization and data warehouse share:
Note these important differences:
You’ve just read that using a data virtualization layer can increase development speed, but by how much? According to Gartner research, “Data fabric reduces time for integration design by 30%, deployment by 30%, and maintenance by 70%.” Because a virtualized data layer removes the need for data migration, you can begin using your data to develop powerful products and applications immediately.
Additionally, you won’t have to build API integrations unless you want to, since a data fabric built on a data virtualization layer already has a solution in place to get the data. A related option, data mesh, goes after the same problem as data fabric but leaves enterprises with a lot of API integration work and other time-intensive development work. Data mesh is more of a high-code solution than data fabric.
You can get even more speed and value from a data fabric approach when you combine it with a platform that includes no-code data modeling and record-level security.
You came here for three key facts about data virtualization and data warehouses. So take these lessons away:
[ How does data fabric fit into a modern automation strategy? Get the Gartner® Hyperautomation 2022 Trends Report. ]