5 Types of Data Integration You Need to Know
“You keep using that word. I do not think it means what you think it means.” – Inigo Montoya, The Princess Bride
Having a discussion about data integration might seem simple enough. However, the term can be interpreted quite differently, depending on the context. It’s very easy for a meeting to devolve into a confusing conversational swirl when terminology has a variety of meanings. That’s why it’s important that you know the common types of data integration when discussing it or approaching a data integration project.
Data Integration Definition: What is Data Integration?
Data integration in the purest sense is about carefully and methodically blending data from different sources, making it more useful and valuable than it was before. IBM provides a strong definition, stating “Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information.”
The key terms here are “combining data… into meaningful and valuable information.” That’s not just about moving data from one place to another or pouring several spouts of data into a single repository. It’s about making the data comprehensive and more easily usable.
Integrated View or Integration Version?
As for the “technical and business processes” mentioned in the definition, there is a wealth of information out there about these processes. There are methods of bringing data together into an integrated view and there are techniques for bringing data together physically, for an integration version. You can argue that both are a type of data integration, the main difference being whether the data was physically moved and/or manipulated. Below are a few common data integration approaches.
#1 Data Consolidation
Data consolidation physically brings data together from several separate systems, creating a version of the consolidated data in one data store. Often the goal of data consolidation is to reduce the number of data storage locations. Extract, transform, and load (ETL) technology supports data consolidation.
ETL pulls data from sources, transforms it into an understandable format, and then transfers it to another database or data warehouse. The ETL process cleans, filters, and transforms data, and then applies business rules before data populates the new source.
#2 Data Propagation
Data propagation is the use of applications to copy data from one location to another. It is event-driven and can be done synchronously or asynchronously. Most synchronous data propagation supports a two-way data exchange between the source and the target. Enterprise application integration (EAI) and enterprise data replication (EDR) technologies support data propagation.
Enterprise Application Integration (EAI)
EAI integrates application systems for the exchange of messages and transactions. It is often used for real-time business transaction processing. Integration platform as a service (iPaaS) is a modern approach to EAI integration.
Enterprise Data Replication (EDR)
EDR typically transfers large amounts of data between databases, instead of applications. Base triggers and logs are used to capture and disseminate data changes between the source and remote databases.
#3 Data Virtualization
Virtualization uses an interface to provide a near real-time, unified view of data from disparate sources with different data models. Data can be viewed in one location but is not stored in that single location. Data virtualization retrieves and interprets data but does not require uniform formatting or a single point of access.
#4 Data Federation
Federation is technically a form of data virtualization. It uses a virtual database and creates a common data model for heterogeneous data from different systems. Data is brought together and viewable from a single point of access. Enterprise information integration (EII) is a technology that supports data federation. It uses data abstraction to provide a unified view of data from different sources. That data can then be presented or analyzed in new ways through applications.
Virtualization and federation are good workarounds for situations where data consolidation is cost prohibitive or would cause too many security and compliance issues.
#5 Data Warehousing
Warehousing is included in this list because it is a commonly used term. However, its meaning is more generic than the other methods previously mentioned. Data warehouses are storage repositories for data. However, when the term “data warehousing,” is used, it implies the cleansing, reformatting, and storage of data, which is basically data integration.
Related Resource: Successful Data Integration: It’s All in Your Approach.
Note from the Editor: This blog was originally published December 6th, 2017. It has been updated with the latest information.