Developers often use disk-based databases (PostgreSQL, MongoDB, and Oracle) as the single source of truth for data because they offer widely adopted programming models. However, despite their popularity, most suffer from one fundamental problem: the database becomes slower as more data is stored. To mitigate this problem, Redis is used as a caching layer to speed up read queries and considerably offload the database load. This approach helps companies to save money by eliminating the use of expensive read replicas. But how do we continuously move the data from the database to Redis without writing tons of code, using different distributed systems, and wasting lots of time?
You can use Redis Data Integration (RDI) for this. RDI updates Redis with any changes made in a source database, using a Change Data Capture (CDC) mechanism. After performing an initial snapshot of the source database and moving the existing data, it will capture any database changes in the form of data streams and store them at the RDI database for processing, which happens as those data streams are written. Transformations are supported and applied before the final data is written into the target database. The best part? All of this is configuration, not coding!
RDI is designed to support apps that use a disk-based database as the system of record, but it must also be fast and scalable. This is a common requirement for mobile, web, and AI apps with a rapidly growing number of users; the performance of the central database is acceptable at first, but it will soon struggle to handle the increasing demand without a cache.
This repository demonstrates how to install, deploy, and use RDI with a fairly realistic use case. You start with a PostgreSQL database running on-premises containing data from an e-commerce, and you use RDI to continuously move data to a Redis database running on Redis Cloud.
- Docker: https://docs.docker.com/get-started/get-docker
- Kubernetes: https://kubernetes.io/releases/download
- Helm charts: https://helm.sh/docs/intro/install
- Terraform: https://developer.hashicorp.com/terraform/install
- Redis Insight: https://redis.io/insight
- Redis Cloud: https://redis.io/try-free
To deploy RDI, you'll need a Kubernetes (K8S) cluster. This workflow ensures all dependencies (Ingress, database, and RDI) are managed and deployed in the correct order, with secure configuration and easy cleanup. Though you can use any K8S distribution, you don't quite need a production-ready K8S cluster. Any local K8S deployment will suffice. Development clusters of K8S, like minikube, kind, or Docker Desktop, will do just fine.
However, you must be mindful of the resources you dedicate to your K8S cluster. To execute this demo smoothly, you must dedicate at least 4 CPUs, 8 GB of memory, and 25 GB of disk to the underlying infrastructure that runs your cluster. Anything less than this will cause the pods to continuously crash and be recreated, making your K8S cluster unstable. Please note that these hardware requirements are not for the host machine that runs your cluster, but for the cluster itself. Once your K8S cluster runs, the RDI deployment is fully automated using scripts in the rdi-deploy
folder.
This option deploys RDI on K8S along with its backend database. This is ideal if you want an all-batteries-included installation of RDI. This option also saves you from spinning up a database on Redis Cloud, which can incur costs.
To deploy RDI with a local database, open a terminal and run:
cd rdi-deploy
./rdi-deploy-localdb.sh
This script will:
- Install the NGINX ingress controller using Helm
- Create the
rdi
namespace in your K8S cluster - Deploy Redis Enterprise and custom resources
- Deploy the RDI database and wait for it to be ready
- Download the RDI Helm chart if not present already
- Extract connection details from Kubernetes secrets
- Generate a secure JWT key to be used with RDI API
- Create a custom
rdi-values.yaml
for Helm deployment - Install RDI using Helm with the generated values
The script has a 5-minute wait strategy for most of the resources created. If some resource takes more than 5 minutes, the script will halt. If this happens, don't worry. Just execute the script again, and it will pick up where it left off.
To monitor the deployment:
helm list -n rdi
You should see the following output:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
default rdi 1 2025-09-02 18:27:13.348276876 +0000 UTC deployed pipeline-0.0.0 0.0.0
rdi rdi 1 2025-09-02 14:27:02.228638 -0400 EDT deployed rdi-1.14.0
List the deployed pods to check their statuses:
kubectl get pod -n rdi
You should see an output similar to this:
NAME READY STATUS RESTARTS AGE
collector-api-66f58f58c7-kl6ph 1/1 Running 0 30s
rdi-api-76f894cc77-xh4sp 1/1 Running 0 37s
rdi-metrics-exporter-6656695547-46dqq 1/1 Running 0 37s
rdi-operator-7c994f8fc8-dfdhb 1/1 Running 0 37s
rdi-reloader-546c9cd849-9wn6z 1/1 Running 0 37s
redis-enterprise-cluster-0 2/2 Running 0 2m12s
redis-enterprise-cluster-services-rigger-5f698b9c75-ftw7g 1/1 Running 0 2m13s
redis-enterprise-operator-7cdbb7cfb8-vwd5h 2/2 Running 0 2m18s
To undeploy and clean up all resources, run:
cd rdi-deploy
./rdi-undeploy-localdb.sh
This option deploys RDI on K8S and a backend database running on Redis Cloud. This is ideal if you want an RDI installation with a database that can scale to your needs, especially if you plan to extend this workload to perform intensive data processing from a source database.
Before using this option, you need to make sure to:
- Create an Redis Cloud account if you don't have one. Note that creating an account takes only 5 minutes.
- Create Redis Cloud API keys and make them available as environment variables before running the script.
- Set your payment methods in your account. You won't be charged until you create a database.
To make your Redis Cloud API keys available as environment variables, you must export the following ones:
export REDISCLOUD_ACCESS_KEY=<THIS_IS_GOING_TO_BE_YOUR_API_ACCOUNT_KEY>
export REDISCLOUD_SECRET_KEY=<THIS_IS_GOING_TO_BE_ONE_API_USER_KEY>
You also need to customize some Terraform variables. Please update the file rdi-deploy/terraform.tfvars and change the values of the following variables:
- payment_card_type
- payment_card_last_four
- essentials_plan_cloud_provider
- essentials_plan_cloud_region
Everything else you can leave unchanged, unless you want to change them as well. Once you have done this, open a terminal and run:
cd rdi-deploy
./rdi-deploy-clouddb.sh
This script will:
- Install the NGINX ingress controller using Helm
- Create the
rdi
namespace in your K8S cluster - Initialize and apply Terraform to create a Redis Cloud database
- Download the RDI Helm chart if needed
- Extract connection details from Terraform outputs and variables
- Generate a secure JWT key to be used with RDI API
- Create a custom
rdi-values.yaml
for Helm deployment - Install RDI using Helm with the generated values
The script has a 5-minute wait strategy for most of the resources created. If some resource takes more than 5 minutes, the script will halt. If this happens, don't worry. Just execute the script again, and it will pick up where it left off.
To monitor the deployment:
helm list -n rdi
You should see the following output:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
default rdi 1 2025-09-02 18:32:27.590886466 +0000 UTC deployed pipeline-0.0.0 0.0.0
rdi rdi 1 2025-09-02 14:32:15.29309 -0400 EDT deployed rdi-1.14.0
List the deployed pods to check their statuses:
kubectl get pod -n rdi
You should see an output similar to this:
NAME READY STATUS RESTARTS AGE
collector-api-66f58f58c7-jw954 1/1 Running 0 36s
rdi-api-76f894cc77-v4q57 1/1 Running 0 44s
rdi-metrics-exporter-6656695547-pzfsk 1/1 Running 0 44s
rdi-operator-7c994f8fc8-vsrwh 1/1 Running 0 44s
rdi-reloader-546c9cd849-6vmcm 1/1 Running 0 44s
To undeploy and clean up all resources, run:
cd rdi-deploy
./rdi-undeploy-clouddb.sh
This project contains a PostgreSQL database with an e-commerce dataset that will be used as source data. You must get this database up and running to play with this demo. In the folder source-db
, you will find a Docker Compose file that will spin up the database and load with data, as well as an instance of pgAdmin you can use to access the database.
-
Open a terminal and navigate to the
source-db
directory:cd source-db
-
Start the database and pgAdmin services:
docker compose up -d
This will:
- Start a PostgreSQL container with Debezium support
- Load initial data from
scripts/initial-load.sql
- Expose the PostgreSQL database over the port
5432
- Start pgAdmin on port
8888
(web interface)
-
Verify the containers are running:
docker compose ps
-
Access pgAdmin in your browser at http://localhost:8888
- Email:
admin@postgres.com
- Password:
pgadmin4pwd
- Email:
-
The PostgreSQL database is accessible at:
- Host:
localhost
- Port:
5432
- User:
postgres
- Password:
postgres
- Database:
postgres
- Host:
Once you have done with this demo, you can stop the services:
docker compose down
The target database will be the Redis database, which will receive the data from RDI. In this use case, the target database represents the database from which your application will read the data, regardless of whether the data was written into the PostgreSQL database. You are going to create this database on Redis Cloud using Terraform. The target database is slightly different from the one used by RDI. It requires fewer resources and doesn't need persistence enabled. For this reason, the Terraform code used will not require a paid database on Redis Cloud; it will use the free plan available for all Redis Cloud users.
To create the target Redis database:
-
Open a terminal and navigate to the
target-db
directory:cd target-db
-
Edit the file target-db/terraform.tfvars and update the variables
essentials_plan_cloud_provider
andessentials_plan_cloud_region
with the options of your choice. -
Initialize Terraform:
terraform init
-
Apply the Terraform configuration to create the database:
terraform apply -auto-approve
-
After completion, you can view the connection details:
terraform output
This will show the host and port from your new target database.
Once you have done with this demo, you can destroy the database:
terraform destroy -auto-approve
Now that everything has been properly deployed, you can start the fun part, which is using RDI to stream data changes from the source database to the target database.
In this section, you will:
- Investigate your current dataset using pgAdmin.
- Use Redis Insight to access your RDI deployment.
- Deploy a RDI pipeline to stream data to Redis.
- Use Redis Insight to verify if data is available.
Open a browser and point to http://localhost:8888
.
Login using:
- Email:
admin@postgres.com
- Password:
pgadmin4pwd
Once you have logged in, you will access the object explorer. The first time you access the object explorer you will need to register the source database. Use the following values to register:
- Host:
postgres
(This is not a typo. Here you should not uselocalhost
) - Port:
5432
- User:
postgres
- Password:
postgres
Then, navigate to Postgres > Databases > postgres > Schemas > public > Tables. You should see the following tables:
This means you are ready to start the data streaming process. Open Redis Insight, then click on Redis Data Integration
. You should see the following screen:
Click in the Add RDI Endpoint
button. The folllowing screen will show up:
Fill the screen with the values shown in the picture above. As for the password, you can retrieve what password to set in the file rdi-deploy/rdi-values.yaml
. The value set in the connection.password
field is what you should use to register the RDI endpoint. Please note that the file rdi-deploy/rdi-values.yaml
is created when you deploy RDI. If this file doesn't exist, return to the section Deploying RDI.
π‘ Tip: Depending on which K8S cluster you are running, Redis Insight won't be able to access your RDI deployment using
https://localhost
. This is undoubtedly the case with minikube. If you're using minikube, you must manually create a tunnel to expose the RDI APIs to the external world. Open a new terminal and runminikube tunnel
. It will ask you for your host password. Leave the terminal open throughout the duration of the demo. You can find more information about this here.
Once you access your RDI endpoint, you can start the configuration of your pipeline. For this step, you can use the code available at the file pipeline-config.yaml. You should add this code to the pipeline editor.
π‘ Tip: Depending on which K8S cluster you are running, RDI won't be able to access your PostgreSQL database using
localhost
. This is undoubtedly the case with minikube and kind. If you're using minikube, replacelocalhost
withhost.minikube.internal
. Otherwise, if you're using kind, replacelocalhost
withhost.docker.internal
.
Replace the values of the variables ${REDIS_DATABASE_HOST}
and ${REDIS_DATABASE_PORT}
with the values from the target database you created on Redis Cloud. You can retrieve these values in your Redis Cloud account using the console. However, an easiest way is by running the command terraform output
in the target-db
folder. You should see an output similar to this:
redis_database_host = "redis-00000.c84.us-east-1-2.ec2.redns.redis-cloud.com"
redis_database_port = "00000"
Once you have finished updating the variables, go ahead and deploy the pipeline. This process may take a few minutes, as RDI performs an initial snapshot of the source database to create a data stream for each table found and start the streaming. Once the process finishes, you should be able to navigate to the Pipeline Status
tab to check the status of your pipeline.
The pipeline should report 78 records
inserted into the target database and a unique counter for each source table viewed as a data stream. This means your RDI deployment is working as expected. At this point, whatever data you write into the source database will instantly stream to Redis. You can verify this by accessing your target database in Redis Cloud.
Access your database using the Redis Cloud console. Then click Connect
and Open in desktop
.
This should open your target database on Redis Insight, allowing you to visualize your data.
These 78 keys
represents the initial snapshot RDI performs in the source database to create the respective data streams. Once created, any data written in the source database should emit an event that RDI will capture and stream into the target database. This includes any INSERT, UPDATE, and DELETE operations. To verify this, you can use the script demo-multiple-users.sql that adds roughly 50
users into the table user
.
One cool feature of RDI that you can leverage is the ability to transform data as it is streamed into the target database. You can create one or more job files that will be used along with the data pipeline during the data streaming. Let's practice this with one example.
First, go to Redis Insight and stop and reset your data pipeline. This will allow you to change your data pipeline without streaming any data. Click the plus sign under Add transformation jobs
in Pipeline Management
. Name this job custom-job
and define it using the code available in the file custom-job-v2.yaml.
This job performs three operations, all related to the user
table. First, it changes the data's output from Hashes to JSON. Second, it adds a new field into the target called display_name
that will have as its value the values first_name
and last_name
concatenated. Third, it will add another field called user_type
with two possible values: internal
or external
, depending on the user's email hostname.
Let's verify this. On Redis Insight, start your data pipeline again so any new data can be streamed into the target. Now use the script demo-add-user.sql to insert a new row into the table user
. Once you execute that script, check Redis Insight and observe the keys from your target database. You should have a new key with a JSON version of the user.
However, the user type is still internal
as their email contains @example.com.
Let's change the email in the source table to trigger the update and the execution of the job transformation. Go ahead and use the script demo-modify-user.sql to update the user's email. You may need to identify which id
is associated with the user before running the script, as you may need to update the WHERE clause of the SQL statement. Once you execute the script, you should immediately see the update in the target database.
This project is licensed under the MIT license.