Skip to content

Support Valkey upgrades on elasticache_global_replication_group #42636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

twelsh-aw
Copy link
Contributor

Once a replication group is added to a global datastore, all of it's engine upgrades need to be controlled on the global datastore. This resource did not support setting an explicit engine value on global datastores, preventing upgrades (#41618).

Further, if creating a Valkey global datastore by inheriting version from the primary (or manually upgrading the engine of the global datastore in AWS Console), then you could not continue manage the secondary replication groups in terraform. A Valkey engine would cause an incorrect destroy & re-create plan (if using defaults for engine) or a conflict (if trying to set the engine correctly). We remove the default engine value, mark it as computed and add DiffSuppressFunc to engine, engine_version and parmeter_group_name to ignore changes to these attributes in plans when the resource belongs to a global datastore (primary or secondary). Fixes #40786 and #41713.

This also has the benefit of removing the need to add explicit ignore_changes lifecycle blocks to attempt to work around these plans. We update existing documentation on around this.

Rollback Plan

If a change needs to be reverted, we will publish an updated version of the library.

Revert the PR. Changes meant to be backwards compatible.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

No.

Description

Relations

References

Output from Acceptance Testing

I ran my new tests that cover each of the issues and I sampled through a subset of existing tests (these are long).

make testacc TESTS="TestAccElastiCacheGlobalReplicationGroup_(SetEngineOn|InheritValkeyEngine)" PKG=elasticache ACCTEST_PARALLELISM=3
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.8 test ./internal/service/elasticache/... -v -count 1 -parallel 3 -run='TestAccElastiCacheGlobalReplicationGroup_(SetEngineOn|InheritValkeyEngine)'  -timeout 360m -vet=off
2025/05/15 12:22:58 Initializing Terraform AWS Provider...
=== RUN   TestAccElastiCacheGlobalReplicationGroup_SetEngineOnCreate_ValkeyUpgrade
=== PAUSE TestAccElastiCacheGlobalReplicationGroup_SetEngineOnCreate_ValkeyUpgrade
=== RUN   TestAccElastiCacheGlobalReplicationGroup_SetEngineOnUpdate_ValkeyUpgrade
=== PAUSE TestAccElastiCacheGlobalReplicationGroup_SetEngineOnUpdate_ValkeyUpgrade
=== RUN   TestAccElastiCacheGlobalReplicationGroup_InheritValkeyEngine_SecondaryReplicationGroup
=== PAUSE TestAccElastiCacheGlobalReplicationGroup_InheritValkeyEngine_SecondaryReplicationGroup
=== CONT  TestAccElastiCacheGlobalReplicationGroup_SetEngineOnCreate_ValkeyUpgrade
=== CONT  TestAccElastiCacheGlobalReplicationGroup_InheritValkeyEngine_SecondaryReplicationGroup
=== CONT  TestAccElastiCacheGlobalReplicationGroup_SetEngineOnUpdate_ValkeyUpgrade
--- PASS: TestAccElastiCacheGlobalReplicationGroup_SetEngineOnUpdate_ValkeyUpgrade (2373.88s)
--- PASS: TestAccElastiCacheGlobalReplicationGroup_SetEngineOnCreate_ValkeyUpgrade (2499.76s)
--- PASS: TestAccElastiCacheGlobalReplicationGroup_InheritValkeyEngine_SecondaryReplicationGroup (3045.84s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/elasticache        3049.872smake testacc PKG=elasticache ACCTEST_PARALLELISM=5 TESTS="TestAccElastiCacheReplicationGroup_EngineVersion_update|TestAccElastiCacheReplicationGroup_OutOfBandUpgrade|TestAccElastiCacheReplicationGroup_Valkey_basic|TestAccElastiCacheReplicationGroup_Engine_RedisToValkey|TestAccElastiCacheReplicationGroup_GlobalReplicationGroupID_basic|TestAccElastiCacheReplicationGroup_Redis_basic_v5"
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.8 test ./internal/service/elasticache/... -v -count 1 -parallel 5 -run='TestAccElastiCacheReplicationGroup_EngineVersion_update|TestAccElastiCacheReplicationGroup_OutOfBandUpgrade|TestAccElastiCacheReplicationGroup_Valkey_basic|TestAccElastiCacheReplicationGroup_Engine_RedisToValkey|TestAccElastiCacheReplicationGroup_GlobalReplicationGroupID_basic|TestAccElastiCacheReplicationGroup_Redis_basic_v5'  -timeout 360m -vet=off
2025/05/15 09:30:42 Initializing Terraform AWS Provider...
=== RUN   TestAccElastiCacheReplicationGroup_Redis_basic_v5
=== PAUSE TestAccElastiCacheReplicationGroup_Redis_basic_v5
=== RUN   TestAccElastiCacheReplicationGroup_Valkey_basic
=== PAUSE TestAccElastiCacheReplicationGroup_Valkey_basic
=== RUN   TestAccElastiCacheReplicationGroup_OutOfBandUpgrade
=== PAUSE TestAccElastiCacheReplicationGroup_OutOfBandUpgrade
=== RUN   TestAccElastiCacheReplicationGroup_EngineVersion_update
=== PAUSE TestAccElastiCacheReplicationGroup_EngineVersion_update
=== RUN   TestAccElastiCacheReplicationGroup_Engine_RedisToValkey
=== PAUSE TestAccElastiCacheReplicationGroup_Engine_RedisToValkey
=== RUN   TestAccElastiCacheReplicationGroup_GlobalReplicationGroupID_basic
=== PAUSE TestAccElastiCacheReplicationGroup_GlobalReplicationGroupID_basic
=== CONT  TestAccElastiCacheReplicationGroup_Redis_basic_v5
=== CONT  TestAccElastiCacheReplicationGroup_EngineVersion_update
=== CONT  TestAccElastiCacheReplicationGroup_GlobalReplicationGroupID_basic
=== CONT  TestAccElastiCacheReplicationGroup_Engine_RedisToValkey
=== CONT  TestAccElastiCacheReplicationGroup_OutOfBandUpgrade
--- PASS: TestAccElastiCacheReplicationGroup_Redis_basic_v5 (800.68s)
=== CONT  TestAccElastiCacheReplicationGroup_Valkey_basic
--- PASS: TestAccElastiCacheReplicationGroup_Valkey_basic (871.53s)
--- PASS: TestAccElastiCacheReplicationGroup_Engine_RedisToValkey (1686.83s)
--- PASS: TestAccElastiCacheReplicationGroup_OutOfBandUpgrade (1689.58s)
--- PASS: TestAccElastiCacheReplicationGroup_GlobalReplicationGroupID_basic (3000.28s)
--- PASS: TestAccElastiCacheReplicationGroup_EngineVersion_update (6488.20s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/elasticache        6492.404s
...

Once a replication group is added to a global datastore, all of it's engine upgrades need to be controlled on the global datastore. This resource did not support setting an explicit engine value on global datastores, preventing upgrades (hashicorp#41618).

Further, if creating a Valkey global datastore by inheriting version from the primary (or manually upgrading the engine of the global datastore in AWS Console), then you could not continue manage the secondary replication groups in terraform. A Valkey engine would cause an incorrect destroy & re-create plan (if using defaults for engine) or a conflict (if trying to set the engine correctly). We remove the default engine value, mark it as computed and add DiffSuppressFunc to engine, engine_version and parmeter_group_name to ignore changes to these attributes in plans when the resource belongs to a global datastore (primary or secondary). Fixes hashicorp#40786 and hashicorp#41713.

This also has the benefit of removing the need to add explicit ignore_changes lifecycle blocks to attempt to work around these plans. We update existing documentation on around this.
Copy link

Community Guidelines

This comment is added to every new Pull Request to provide quick reference to how the Terraform AWS Provider is maintained. Please review the information below, and thank you for contributing to the community that keeps the provider thriving! 🚀

Voting for Prioritization

  • Please vote on this Pull Request by adding a 👍 reaction to the original post to help the community and maintainers prioritize it.
  • Please see our prioritization guide for additional information on how the maintainers handle prioritization.
  • Please do not leave +1 or other comments that do not add relevant new information or questions; they generate extra noise for others following the Pull Request and do not help prioritize the request.

Pull Request Authors

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. documentation Introduces or discusses updates to documentation. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. service/elasticache Issues and PRs that pertain to the elasticache service. size/XL Managed by automation to categorize the size of a PR. labels May 15, 2025
@twelsh-aw twelsh-aw marked this pull request as ready for review May 15, 2025 21:57
@twelsh-aw twelsh-aw requested a review from a team as a code owner May 15, 2025 21:57
Copy link

github-actions bot commented May 15, 2025

✅ Thank you for correcting the previously detected issues! The maintainers appreciate your efforts to make the review process as smooth as possible.

twelsh-aw added 2 commits May 15, 2025 18:04
This is redis6.x specific. It has nothing to do with my DiffSuppressFunc changes or removing the lifecycle { ignore_changes } blocks. Still needed to pass tests :)

```
Step 2/2 error running import: ImportStateVerify attributes not equivalent. Difference is shown below. The - symbol indicates attributes missing after import.

          map[string]string{
        -       "engine_version": "6.x",
        +       "engine_version": "6.2",
          }
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Introduces or discusses updates to documentation. needs-triage Waiting for first response or review from a maintainer. service/elasticache Issues and PRs that pertain to the elasticache service. size/XL Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: aws_elasticache_replication_group for valkey engine is stuck in resource replace loop
1 participant