[automatic failover] Implement HealtStatusManager + weighted endpoints #4189

atakavci · 2025-06-27T16:18:15Z

weighted cluster seleciton
Healtstatus manager with initial listener and registration logic
pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
fix failing integration tests impacted by weighted clusters

- Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters

src/main/java/redis/clients/jedis/HostAndPort.java

src/main/java/redis/clients/jedis/UnifiedJedis.java

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

src/main/java/redis/clients/jedis/mcf/HealthCheck.java

- add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check

atakavci · 2025-07-10T12:02:32Z

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

+        if (newStatus.isHealthy()) {
+            if (clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) {
+                // lets check if weighted switching is possible
+                Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster();
+                if (failbackCluster == clusterWithHealthChange
+                    && clusterWithHealthChange.getWeight() > activeCluster.getWeight()) {
+                    setActiveCluster(clusterWithHealthChange, false);
+                }
+            }
+        } else if (clusterWithHealthChange == activeCluster) {
+            iterateActiveCluster();
+        }


@a-TODO-rov, is this what you suggest? if not, could you provide code snippet on your suggestion?

Suggested change

if (newStatus.isHealthy()) {

if (clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) {

// lets check if weighted switching is possible

Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster();

if (failbackCluster == clusterWithHealthChange

&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) {

setActiveCluster(clusterWithHealthChange, false);

}

}

} else if (clusterWithHealthChange == activeCluster) {

iterateActiveCluster();

}

if (clusterWithHealthChange == activeCluster && !newStatus.isHealthy()) {

iterateActiveCluster();

return;

}

if (newStatus.isHealthy() && clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) {

// lets check if weighted switching is possible

Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster();

if (failbackCluster == clusterWithHealthChange

&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) {

setActiveCluster(clusterWithHealthChange, false);

}

}

}

ggivo

Hi
Looks good in general.
Adding some comments and also have some concern around possible syncronisation issue when combining HealtStatusChecks with existing CB failover logic.

src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

src/main/java/redis/clients/jedis/mcf/FailoverOptions.java

src/main/java/redis/clients/jedis/mcf/HealthStatusManager.java

src/main/java/redis/clients/jedis/mcf/FailoverOptions.java

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

src/main/java/redis/clients/jedis/mcf/HealthStatus.java

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

atakavci · 2025-07-11T08:42:51Z

hey @dengliming ,
what about these comments you had via email. Could you explain and if this is something not planned, please remove or disable the automation or whatever process this is.
This is kind of spamming our PR's , so thank you for your cooperation.

- clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests

Copilot

Pull Request Overview

This PR refactors the multi-cluster failover provider to use weighted, endpoint-based selection and adds a pluggable health check framework.

Introduce HealthStatusManager and health check strategies (Echo, LagAware) to monitor endpoint health.
Refactor MultiClusterPooledConnectionProvider from index-based to weighted, endpoint-driven failover (iterateActiveCluster, setActiveCluster).
Update tests and client APIs (UnifiedJedis, CommandObjects) to exercise the new health/failover behavior.

Reviewed Changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java	Switch to `setActiveCluster` & endpoint-based cluster lookup
src/test/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProviderTest.java	Update to weighted selection and `iterateActiveCluster` API
src/test/java/redis/clients/jedis/misc/AutomaticFailoverTest.java	Replace `incrementActiveMultiClusterIndex` with `iterateActiveCluster`
src/test/java/redis/clients/jedis/mcf/HealthCheckTest.java	Add unit tests for `HealthStatusManager` and health checks
src/test/java/redis/clients/jedis/mcf/HealthCheckIntegrationTest.java	Integration tests for disabling/default/custom health strategies
src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java	Refactor integration tests for new builder options & failover API
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java	Core refactor to weighted, endpoint-keyed clusters and health
src/main/java/redis/clients/jedis/mcf/RedisRestAPIHelper.java	New REST helper for external DB availability checks
src/main/java/redis/clients/jedis/mcf/LagAwareStrategy.java	Implement lag-aware health check strategy
src/main/java/redis/clients/jedis/mcf/HealthStatusManager.java	Central health status manager with listener support
src/main/java/redis/clients/jedis/mcf/HealthStatusListener.java	Interface for health status event listeners
src/main/java/redis/clients/jedis/mcf/HealthStatusChangeEvent.java	Event object for status changes
src/main/java/redis/clients/jedis/mcf/HealthStatus.java	Enum encapsulating healthy/unhealthy state
src/main/java/redis/clients/jedis/mcf/HealthCheckStrategy.java	Strategy interface for health checks
src/main/java/redis/clients/jedis/mcf/HealthCheckCollection.java	Thread-safe collection of ongoing health checks
src/main/java/redis/clients/jedis/mcf/HealthCheck.java	Schedules & executes periodic health checks
src/main/java/redis/clients/jedis/mcf/EchoStrategy.java	Echo-based health check strategy
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java	Update to use `iterateActiveCluster` for failover
src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java	Drop obsolete options and use cluster retry flags
src/main/java/redis/clients/jedis/UnifiedJedis.java	Adjust constructors and add `echo` command
src/main/java/redis/clients/jedis/MultiClusterClientConfig.java	Extend config API for weight, health checks, retry/failback
src/main/java/redis/clients/jedis/HostAndPort.java	Implement new `Endpoint` interface
src/main/java/redis/clients/jedis/CommandObjects.java	Add `echo` command object

Comments suppressed due to low confidence (4)

src/main/java/redis/clients/jedis/mcf/RedisRestAPIHelper.java:38

[nitpick] There's a placeholder comment here but no actual logging of the IOException. Consider injecting a logger and recording the exception for troubleshooting.

        HttpURLConnection getConnection = createConnection(bdbsUri, "GET");

src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java:41

The comment references the old incrementActiveMultiClusterIndex() method. It should be updated to mention iterateActiveCluster() to match the current implementation.

        try {

src/test/java/redis/clients/jedis/mcf/HealthCheckTest.java:300

[nitpick] The comment says the default is null, but the code asserts the default is EchoStrategy.DEFAULT. Please update the comment to reflect the actual default strategy.

        assertEquals(EchoStrategy.DEFAULT, clusterConfig.getHealthCheckStrategySupplier()); // Default is null (no health check)

src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java:53

[nitpick] This commented-out code references a non-existent method and clutters the code. Consider removing it for clarity.

                // int activeMultiClusterIndex = provider.incrementActiveMultiClusterIndex1();

src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ggivo

LGTM

Let's start merging them in the feature branch so we have a consistent state there

- weighted cluster seleciton

8a9f876

- Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters

atakavci requested review from uglide and ggivo June 30, 2025 10:00

a-TODO-rov reviewed Jul 7, 2025

View reviewed changes

vladvildanov reviewed Jul 8, 2025

View reviewed changes

src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Outdated Show resolved Hide resolved

src/main/java/redis/clients/jedis/mcf/HealthCheck.java Show resolved Hide resolved

atakavci added 3 commits July 9, 2025 12:40

- fix naming

df66b1e

clean up and mark override methods

13757f5

atakavci commented Jul 10, 2025

View reviewed changes

atakavci added 2 commits July 10, 2025 15:15

fix link in javadoc

ef5d83a

fix formatting

a15fc64

ggivo reviewed Jul 10, 2025

View reviewed changes

redis deleted a comment from dengliming Jul 10, 2025

atakavci self-assigned this Jul 11, 2025

atakavci added the feature label Jul 11, 2025

atakavci mentioned this pull request Jul 11, 2025

[automatic failover] Implement add/remove endpoints #4200

Merged

- fix double registered listeners in healtstatusmgr

cf38240

- clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests

atakavci requested a review from Copilot July 14, 2025 14:35

Copilot AI reviewed Jul 14, 2025

View reviewed changes

src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Outdated Show resolved Hide resolved

src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Outdated Show resolved Hide resolved

Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

c2fb34c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This comment was marked as off-topic.

Sign in to view

ggivo approved these changes Aug 11, 2025

View reviewed changes

atakavci merged commit b79a9f3 into redis:feature/automatic-failover Aug 11, 2025
5 of 6 checks passed

[automatic failover] Implement HealtStatusManager + weighted endpoints #4189

[automatic failover] Implement HealtStatusManager + weighted endpoints #4189

Uh oh!

Conversation

atakavci commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atakavci Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

ggivo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atakavci commented Jul 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

ggivo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!