-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[automatic failover] Implement HealtStatusManager + weighted endpoints #4189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[automatic failover] Implement HealtStatusManager + weighted endpoints #4189
Conversation
atakavci
commented
Jun 27, 2025
- weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing integration tests impacted by weighted clusters
- Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
- add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check
if (newStatus.isHealthy()) { | ||
if (clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) { | ||
// lets check if weighted switching is possible | ||
Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster(); | ||
if (failbackCluster == clusterWithHealthChange | ||
&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) { | ||
setActiveCluster(clusterWithHealthChange, false); | ||
} | ||
} | ||
} else if (clusterWithHealthChange == activeCluster) { | ||
iterateActiveCluster(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-TODO-rov, is this what you suggest? if not, could you provide code snippet on your suggestion?
if (newStatus.isHealthy()) { | |
if (clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) { | |
// lets check if weighted switching is possible | |
Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster(); | |
if (failbackCluster == clusterWithHealthChange | |
&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) { | |
setActiveCluster(clusterWithHealthChange, false); | |
} | |
} | |
} else if (clusterWithHealthChange == activeCluster) { | |
iterateActiveCluster(); | |
} | |
if (clusterWithHealthChange == activeCluster && !newStatus.isHealthy()) { | |
iterateActiveCluster(); | |
return; | |
} | |
if (newStatus.isHealthy() && clusterWithHealthChange.isFailbackEnabled() && activeCluster != clusterWithHealthChange) { | |
// lets check if weighted switching is possible | |
Map.Entry<Endpoint, Cluster> failbackCluster = findWeightedFailbackCluster(); | |
if (failbackCluster == clusterWithHealthChange | |
&& clusterWithHealthChange.getWeight() > activeCluster.getWeight()) { | |
setActiveCluster(clusterWithHealthChange, false); | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi
Looks good in general.
Adding some comments and also have some concern around possible syncronisation issue when combining HealtStatusChecks
with existing CB failover logic.
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Outdated
Show resolved
Hide resolved
hey @dengliming , |
- clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the multi-cluster failover provider to use weighted, endpoint-based selection and adds a pluggable health check framework.
- Introduce
HealthStatusManager
and health check strategies (Echo, LagAware) to monitor endpoint health. - Refactor
MultiClusterPooledConnectionProvider
from index-based to weighted, endpoint-driven failover (iterateActiveCluster
,setActiveCluster
). - Update tests and client APIs (
UnifiedJedis
,CommandObjects
) to exercise the new health/failover behavior.
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java | Switch to setActiveCluster & endpoint-based cluster lookup |
src/test/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProviderTest.java | Update to weighted selection and iterateActiveCluster API |
src/test/java/redis/clients/jedis/misc/AutomaticFailoverTest.java | Replace incrementActiveMultiClusterIndex with iterateActiveCluster |
src/test/java/redis/clients/jedis/mcf/HealthCheckTest.java | Add unit tests for HealthStatusManager and health checks |
src/test/java/redis/clients/jedis/mcf/HealthCheckIntegrationTest.java | Integration tests for disabling/default/custom health strategies |
src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java | Refactor integration tests for new builder options & failover API |
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java | Core refactor to weighted, endpoint-keyed clusters and health |
src/main/java/redis/clients/jedis/mcf/RedisRestAPIHelper.java | New REST helper for external DB availability checks |
src/main/java/redis/clients/jedis/mcf/LagAwareStrategy.java | Implement lag-aware health check strategy |
src/main/java/redis/clients/jedis/mcf/HealthStatusManager.java | Central health status manager with listener support |
src/main/java/redis/clients/jedis/mcf/HealthStatusListener.java | Interface for health status event listeners |
src/main/java/redis/clients/jedis/mcf/HealthStatusChangeEvent.java | Event object for status changes |
src/main/java/redis/clients/jedis/mcf/HealthStatus.java | Enum encapsulating healthy/unhealthy state |
src/main/java/redis/clients/jedis/mcf/HealthCheckStrategy.java | Strategy interface for health checks |
src/main/java/redis/clients/jedis/mcf/HealthCheckCollection.java | Thread-safe collection of ongoing health checks |
src/main/java/redis/clients/jedis/mcf/HealthCheck.java | Schedules & executes periodic health checks |
src/main/java/redis/clients/jedis/mcf/EchoStrategy.java | Echo-based health check strategy |
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java | Update to use iterateActiveCluster for failover |
src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java | Drop obsolete options and use cluster retry flags |
src/main/java/redis/clients/jedis/UnifiedJedis.java | Adjust constructors and add echo command |
src/main/java/redis/clients/jedis/MultiClusterClientConfig.java | Extend config API for weight, health checks, retry/failback |
src/main/java/redis/clients/jedis/HostAndPort.java | Implement new Endpoint interface |
src/main/java/redis/clients/jedis/CommandObjects.java | Add echo command object |
Comments suppressed due to low confidence (4)
src/main/java/redis/clients/jedis/mcf/RedisRestAPIHelper.java:38
- [nitpick] There's a placeholder comment here but no actual logging of the
IOException
. Consider injecting a logger and recording the exception for troubleshooting.
HttpURLConnection getConnection = createConnection(bdbsUri, "GET");
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java:41
- The comment references the old
incrementActiveMultiClusterIndex()
method. It should be updated to mentioniterateActiveCluster()
to match the current implementation.
try {
src/test/java/redis/clients/jedis/mcf/HealthCheckTest.java:300
- [nitpick] The comment says the default is null, but the code asserts the default is
EchoStrategy.DEFAULT
. Please update the comment to reflect the actual default strategy.
assertEquals(EchoStrategy.DEFAULT, clusterConfig.getHealthCheckStrategySupplier()); // Default is null (no health check)
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java:53
- [nitpick] This commented-out code references a non-existent method and clutters the code. Consider removing it for clarity.
// int activeMultiClusterIndex = provider.incrementActiveMultiClusterIndex1();
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let's start merging them in the feature branch so we have a consistent state there
b79a9f3
into
redis:feature/automatic-failover