Consistent access and delivery with Data Integration

Data integration patterns help create a unified, accurate, and consistent view of enterprise data within an organization. This data is often dissimilar, living in different locations and being stored in a variety of formats. 

The approaches used to achieve data integration goals will depend largely on the Quality-of-Service (QoS) and usage characteristics surrounding different sets of data. A data integration strategy helps to logically—and perhaps also physically—combine different sets of enterprise data sources to expose the data services needed by your organization.

Understanding data integration patterns and using them effectively can help organizations create an effective data integration strategy. In the sections that follow, we will detail these patterns.

Legacy data gateways for microservices

Application architecture evolution has fragmented the backend implementation into independent microservices and functions. However, there's still a gap in the way this evolution has dealt with data because evolution tends to avoid dealing with static environments.

At the same time, microservices encourage developers to create new polyglot data persistence layers that then, need to be composite to deliver business value. How can we apply the knowledge from API gateways to these new data stores?

In this discussion of legacy data, Hugo Guerrero talks about the behavior of data gateways and API gateways, the different data gateway types and their architectures, and the extended data-proxy for hybrid cloud deployments.

Pattern 1. Data consolidation

Data consolidation involves designing and implementing a data integration process that feeds a datastore with complete, enriched data. This approach allows for data restructuring, a reconciliation process, thorough cleansing, and additional steps for aggregation and further enrichment.

etl

Extract, transform, load (ETL)

ETL offloaded the transformation of raw data to usable data from the target datastore. This transformation process can become a bottleneck. In cloud computing, there's no added benefit, such as the reduction in target server loads.

ELT

Extract, load transform (ELT)

ELT  is highly scalable–store as much data as you need in its raw form and get it to the target quickly. No specialized infrastructure for transformation processes is necessary before it lands at its destination.

Pattern 2. Data federation

Data federation uses a pull approach where data is retrieved from the underlying source systems on-demand. This pattern provides real-time access to data. Data federation creates a virtualized view of the data with no data replication or moving of the source system data.

EAI

Composite service

A composite service implements the aggregator pattern. It combines the data from different, distinct services in a meaningful way and serves this response to the consuming application.

EDR

Data virtualization or Enterprise Information Integration (EII)

Data virtualization (Ell) cpmbines large sets of diverse data sources in a way that makes them appear to a data consumer as a single, uniform data source. It uses data abstraction to provide a common data access layer.

Pattern 3. Data propagation

Data propagation involves the promotion of data updates on two levels. At the application level, an event in the source application triggers processing in one or more target applications.  At the datastore level, an event in the source system triggers updates in the source datastore. These change events are then replicated in near real-time to one or more target datastores.

Enterprise Application Integration

Enterprise Application Integration (EAI)

EAI is distributed, lightweight, and scalable for elastic operating environments—the integration itself may be deployed as a containerized application.

Enterprise Data Replication (EDR)

Enterprise Data Replication (EDR)

In distributed and microservices architectures, replication allows applications to be more reliable. 

The data service needs may be replicated and colocated with it and stored in a manner that is more usable by that particular service. This reduces overhead and latency.

Red Hat build of Apache Camel

Apache Camel is an open source integration framework that implements EIPs with mature and robust ready-to-use building blocks, enabling developers to rapidly create data flows and easily test and maintain them.

camel logo

Data integration common practices

Change data capture

Change data capture detects data change events in a source datastore and triggers an update process in another datastore or system. CDC is usually implemented as trigger-based or log-based. In the trigger-based approach, transaction events are logged in a separate shadow table that can be replayed to copy those events to the target system on a regular basis. Log-based CDC—also known as transaction log tailing—identifies data change events by scanning transaction logs. This approach is often used as it can be applied to many data change scenarios and can support systems with extremely high transaction volumes because of the minimal amount of overhead it involves.

Event sourcing

Event sourcing is a pattern that makes sure that all changes to an application’s state are stored as a sequence of events. These events can then be used for temporal queries allowing for the reconstruction of past states and activity replay. This pattern is useful for creating audit logs, debugging, and use cases that require the reconstruction of the state at a specific point.

Read more

Streaming data and event stream processing

ESP involves taking action on a series of data points that originate from a system that continuously creates data. In this context, an event is a data point in the system and the stream is the continuous delivery of those events. This series of events is also referred to as streaming data. The types of actions that are taken as a result of these events include aggregations, analytics, transformations, enrichment, and ingestion into another datastore.

Start tutorial

Distributed caching and in-memory data grids

The concept of caching is to provide storage capacity for data on a system that's used to serve future requests more quickly. Data that's stored in cache is placed there because it's frequently accessed or contains duplicated copies of data stored in another datastore. The overarching goal of caching is to improve performance.

Read more

Data integration use cases

Data replication

Data replication

CDC can be used for data replication to multiple databases, data lakes, or data warehouses, to ensure each resource has the latest version of the data. In this way, CDC can provide multiple distributed and even siloed teams with access to the same up-to-date data.

  Click here to see a diagram

Auditing

Auditing

Facing today's strict data compliance requirements, and heavy penalties for noncompliance, it is essential to save a history of changes made to your data. CDC can be used to save data changes for auditing or archiving requirements.

  Click here to see a diagram

Microservice data exchange

Microservice data exchange

CDC can be used to sync microservices with monolithic applications, enabling the seamless transfer of data changes from legacy systems to microservices-based applications.

  Click here to see a diagram

Mono-to-micro Strangler Pattern

Mono-to-micro Strangler Pattern

Through an incremental approach, you can take scoped components and move them to a new microservices architecture. Use CDC to stream changes from the monolithic database over to the microservices database and the other way around.

  Click here to see a diagram

Battle of the in-memory data stores

Have you ever wondered what the relative differences are between two of the more popular open source, in-memory data stores, and cachés? The caché is a smaller, faster memory component inserted between the CPU and the main memory that stores its data on disks for retrieval, while in-memory data stores depend on machine memory to store retrievable data.

In this DevNation Tech Talk, the DevNation team describes those differences and more importantly, provides live demonstrations of the key capabilities that could have a major impact on your architectural

Tech Talk

Get started with hands-on data integration

Interactive Tutorial

Solution Pattern: Recommendation Engine using Event Streaming

An event streaming platform using Red Hat Streams for Apache Kafka based on...

Interactive Tutorial

Solution Pattern: Modernize your stack by adopting Change Data Capture

Extend capabilities with no changes to legacy apps through data integration...

Interactive Tutorial

Solution Pattern: Manage and Secure APIs with an API First Approach

Discover how an API First Approach provides the right framework to build APIs...

Interactive Tutorial

Solution Pattern: Fuse to Apache Camel migration

An accelerated path to migrating applications from Red Hat Fuse to Red Hat...

Interactive Tutorial

Solution Pattern: Event-driven intelligent applications

Event-driven Sentiment Analysis using Kafka, Knative and AI/ML

Interactive Tutorial

Solution Pattern: Event Driven API Management

Expand your API Management strategy beyond RESTful APIs into event-driven...

More integration resources

Download Red Hat build of OpenJDK hero banner logo
Article
Dec 20, 2024

Our top application development articles of 2024

Explore this year's most popular articles on Kafka data storage and...

Coding shared image
Article
Dec 12, 2024

What’s new in Red Hat build of Apache Camel 4.8

Ivo Bek

Red Hat build of Apache Camel 4.8 brings enhancements in contract-first API...

Featured image for Red Hat OpenShift AI.
Article
Dec 04, 2024

Level up your generative AI with LLMs and RAG

Ritesh Shah

Learn how a developer can work with RAG and LLM leveraging their own data...

Featured image for Apache ActiveMQ Artemis.
Article
Nov 26, 2024

Try Apache Camel: From concept to deployment on OpenShift

Bruno Meseguer

This article will guide you through the process of rapid prototyping using...

Featured image for Red Hat Ansible Automation Platform
Article
Oct 25, 2024

5 ways to leverage Event-Driven Ansible for your platform

Vivien Wang

Discover 5 potential use cases for Event-Driven Ansible, from network...

Integration Integration essentials.
Article
Oct 04, 2024

Tutorial: Tool up your LLM with Apache Camel on OpenShift

Bruno Meseguer

This tutorial gives you a unique chance to learn, hands-on, some of the...

Coding shared image
Article
Dec 12, 2024

What’s new in Red Hat build of Apache Camel 4.8

Ivo Bek

Red Hat build of Apache Camel 4.8 brings enhancements in contract-first API...

Featured image for Apache ActiveMQ Artemis.
Article
Nov 26, 2024

Try Apache Camel: From concept to deployment on OpenShift

Bruno Meseguer

This article will guide you through the process of rapid prototyping using...

Featured image for Red Hat OpenShift AI.
Article
Jul 22, 2024

​​Try OpenShift AI and integrate with Apache Camel

Bruno Meseguer

This article explains how to use Red Hat OpenShift AI in the Developer...

Featured image for ML with OpenShift
Article
May 24, 2024

Implement AI-driven edge to core data pipelines

Bruno Meseguer

The Edge to Core Pipeline Pattern automates a continuous cycle for releasing...

Apache Camel and Quarkus on Red Hat OpenShift
Article
May 22, 2024

Configure SOAP web services with Apache Camel on Quarkus

Luis Falero Otiniano

Explore how to integrate SOAP and REST services using Quarkus and Apache Camel.

The Red Hat build of Camel
Article
Mar 27, 2024

Migrating from Red Hat Fuse to Red Hat build of Apache Camel

Michael Thirion +1

Discover how to simplify your migration path to the Red Hat build of Apache...