Skip to content

[automatic failover] Implement HealtStatusManager + weighted endpoints #4189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

atakavci
Copy link
Contributor

  • weighted cluster seleciton
  • Healtstatus manager with initial listener and registration logic
  • pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
  • fix failing integration tests impacted by weighted clusters

- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters
@atakavci atakavci requested review from uglide and ggivo June 30, 2025 10:00
atakavci added 3 commits July 9, 2025 12:40
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check
Comment on lines 196 to 207
if (newStatus.isHealthy()) {
if (clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) {
// lets check if weighted switching is possible
Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster();
if (failbackCluster == clusterWithHealthChange
&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) {
setActiveCluster(clusterWithHealthChange, false);
}
}
} else if (clusterWithHealthChange == activeCluster) {
iterateActiveCluster();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-TODO-rov, is this what you suggest? if not, could you provide code snippet on your suggestion?

Suggested change
if (newStatus.isHealthy()) {
if (clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) {
// lets check if weighted switching is possible
Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster();
if (failbackCluster == clusterWithHealthChange
&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) {
setActiveCluster(clusterWithHealthChange, false);
}
}
} else if (clusterWithHealthChange == activeCluster) {
iterateActiveCluster();
}
if (clusterWithHealthChange == activeCluster && !newStatus.isHealthy()) {
iterateActiveCluster();
return;
}
if (newStatus.isHealthy() && clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) {
// lets check if weighted switching is possible
Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster();
if (failbackCluster == clusterWithHealthChange
&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) {
setActiveCluster(clusterWithHealthChange, false);
}
}
}

Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi
Looks good in general.
Adding some comments and also have some concern around possible syncronisation issue when combining HealtStatusChecks with existing CB failover logic.

@redis redis deleted a comment from dengliming Jul 10, 2025
@atakavci
Copy link
Contributor Author

hey @dengliming ,
what about these comments you had via email. Could you explain and if this is something not planned, please remove or disable the automation or whatever process this is.
This is kind of spamming our PR's , so thank you for your cooperation.

- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests
@atakavci atakavci requested a review from Copilot July 14, 2025 14:35
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the multi-cluster failover provider to use weighted, endpoint-based selection and adds a pluggable health check framework.

  • Introduce HealthStatusManager and health check strategies (Echo, LagAware) to monitor endpoint health.
  • Refactor MultiClusterPooledConnectionProvider from index-based to weighted, endpoint-driven failover (iterateActiveCluster, setActiveCluster).
  • Update tests and client APIs (UnifiedJedis, CommandObjects) to exercise the new health/failover behavior.

Reviewed Changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java Switch to setActiveCluster & endpoint-based cluster lookup
src/test/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProviderTest.java Update to weighted selection and iterateActiveCluster API
src/test/java/redis/clients/jedis/misc/AutomaticFailoverTest.java Replace incrementActiveMultiClusterIndex with iterateActiveCluster
src/test/java/redis/clients/jedis/mcf/HealthCheckTest.java Add unit tests for HealthStatusManager and health checks
src/test/java/redis/clients/jedis/mcf/HealthCheckIntegrationTest.java Integration tests for disabling/default/custom health strategies
src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java Refactor integration tests for new builder options & failover API
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java Core refactor to weighted, endpoint-keyed clusters and health
src/main/java/redis/clients/jedis/mcf/RedisRestAPIHelper.java New REST helper for external DB availability checks
src/main/java/redis/clients/jedis/mcf/LagAwareStrategy.java Implement lag-aware health check strategy
src/main/java/redis/clients/jedis/mcf/HealthStatusManager.java Central health status manager with listener support
src/main/java/redis/clients/jedis/mcf/HealthStatusListener.java Interface for health status event listeners
src/main/java/redis/clients/jedis/mcf/HealthStatusChangeEvent.java Event object for status changes
src/main/java/redis/clients/jedis/mcf/HealthStatus.java Enum encapsulating healthy/unhealthy state
src/main/java/redis/clients/jedis/mcf/HealthCheckStrategy.java Strategy interface for health checks
src/main/java/redis/clients/jedis/mcf/HealthCheckCollection.java Thread-safe collection of ongoing health checks
src/main/java/redis/clients/jedis/mcf/HealthCheck.java Schedules & executes periodic health checks
src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Echo-based health check strategy
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java Update to use iterateActiveCluster for failover
src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java Drop obsolete options and use cluster retry flags
src/main/java/redis/clients/jedis/UnifiedJedis.java Adjust constructors and add echo command
src/main/java/redis/clients/jedis/MultiClusterClientConfig.java Extend config API for weight, health checks, retry/failback
src/main/java/redis/clients/jedis/HostAndPort.java Implement new Endpoint interface
src/main/java/redis/clients/jedis/CommandObjects.java Add echo command object
Comments suppressed due to low confidence (4)

src/main/java/redis/clients/jedis/mcf/RedisRestAPIHelper.java:38

  • [nitpick] There's a placeholder comment here but no actual logging of the IOException. Consider injecting a logger and recording the exception for troubleshooting.
        HttpURLConnection getConnection = createConnection(bdbsUri, "GET");

src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java:41

  • The comment references the old incrementActiveMultiClusterIndex() method. It should be updated to mention iterateActiveCluster() to match the current implementation.
        try {

src/test/java/redis/clients/jedis/mcf/HealthCheckTest.java:300

  • [nitpick] The comment says the default is null, but the code asserts the default is EchoStrategy.DEFAULT. Please update the comment to reflect the actual default strategy.
        assertEquals(EchoStrategy.DEFAULT, clusterConfig.getHealthCheckStrategySupplier()); // Default is null (no health check)

src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java:53

  • [nitpick] This commented-out code references a non-existent method and clutters the code. Consider removing it for clarity.
                // int activeMultiClusterIndex = provider.incrementActiveMultiClusterIndex1();

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@dengliming

This comment was marked as off-topic.

Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Let's start merging them in the feature branch so we have a consistent state there

@atakavci atakavci merged commit b79a9f3 into redis:feature/automatic-failover Aug 11, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants