-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Problem
The Aerospike Kubernetes Operator (AKO) incorrectly flags pods for deletion when their container image URL is modified by an external mutating admission webhook. This occurs because AKO's reconciliation logic compares the pod's live image URL with the one defined in the AerospikeCluster
CRD, causing a constant mismatch and an endless loop of pod deletion and recreation.
Context
Many Kubernetes clusters utilize mutating admission webhooks to automatically modify pod specifications during creation. A common use case for this is to enforce a pull-through cache for container images, like the one offered by Amazon ECR. In such scenarios, the webhook changes the image's registry prefix (e.g., from docker.io to aws_account_id.dkr.ecr.region.amazonaws.com/docker-hub/
). The goal is to optimize image pulls, reduce costs, and improve security.
Current Behavior
AKO's reconciliation logic assumes the pod's image URL should be an exact match to the one specified in the AerospikeCluster
CRD. When an admission webhook modifies the image URL (e.g., to use a pull-through cache), AKO detects a discrepancy. It then proceeds to delete the "mismatched" pod, expecting the StatefulSet controller to recreate it with the "correct" image. This action triggers the webhook again, leading to the same image mutation, which in turn causes AKO to delete the pod once more. This creates an endless cycle, preventing the pods from ever becoming ready.
Expected Behavior
AKO should be designed to be more tolerant of image mutations performed by admission webhooks. A better comparison method would be to check if the desired image name and tag (e.g., aerospike/aerospike-server-enterprise:8.1
) are contained within the mutated image URL (e.g., 123456789012.dkr.ecr.us-east-1.amazonaws.com/docker-hub/aerospike/aerospike-server-enterprise:8.1
. This approach would allow AKO to recognize that the image is correct, even if its registry prefix has been changed.