-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Problem
The Aerospike Kubernetes Operator (AKO) repeatedly deletes and recreates pods flagged as Unschedulable
by the Kubernetes scheduler. This behavior prevents cluster autoscalers, such as Karpenter, from provisioning new nodes, leading to an infinite loop of pod creation and termination.
Context
In a cost-efficient cluster design, nodes are provisioned on demand rather than pre-provisioned as warm spare capacity. Cluster autoscalers like Karpenter or the Cluster Autoscaler are responsible for adding new nodes when they detect pods in a pending state. Karpenter, for example, typically provisions new nodes only after the Kubernetes scheduler marks a pod as Unschedulable
, allowing a configurable --batch-idle-duration to consolidate multiple pending pods before acting.
Current Behavior
The AKO's reconciliation logic proactively deletes Aerospike pods almost immediately after the Kubernetes scheduler marks them as Unschedulable
. This happens before a cluster autoscaler has sufficient time to observe the pending pod and provision new capacity. As a result, the autoscaler never sees a stable Unschedulable
pod to trigger a scale-up event. This creates a destructive feedback loop where AKO continuously deletes pods, which are then immediately recreated by the StatefulSet controller, only to be deleted again by AKO.
Details
When creating a new Aerospike clusters, pods are created and deleted endlessly as following:
NAME READY STATUS RESTARTS AGE
firstcluster-0-0 0/1 Pending 0 0s
firstcluster-0-0 0/1 Pending 0 0s
firstcluster-0-2 0/1 Pending 0 0s
firstcluster-0-1 0/1 Pending 0 0s
firstcluster-0-2 0/1 Pending 0 0s
firstcluster-0-1 0/1 Pending 0 0s
firstcluster-0-2 0/1 Terminating 0 4s
firstcluster-0-2 0/1 Terminating 0 4s
firstcluster-0-1 0/1 Terminating 0 4s
firstcluster-0-1 0/1 Terminating 0 4s
firstcluster-0-0 0/1 Terminating 0 4s
firstcluster-0-0 0/1 Terminating 0 4s
firstcluster-0-2 0/1 Pending 0 0s
firstcluster-0-2 0/1 Pending 0 0s
firstcluster-0-0 0/1 Pending 0 0s
firstcluster-0-0 0/1 Pending 0 0s
firstcluster-0-1 0/1 Pending 0 0s
firstcluster-0-1 0/1 Pending 0 0s
firstcluster-0-2 0/1 Terminating 0 0s
firstcluster-0-2 0/1 Terminating 0 0s
firstcluster-0-1 0/1 Terminating 0 0s
firstcluster-0-1 0/1 Terminating 0 0s
firstcluster-0-0 0/1 Terminating 0 0s
firstcluster-0-0 0/1 Terminating 0 0s
firstcluster-0-2 0/1 Pending 0 0s
firstcluster-0-2 0/1 Pending 0 0s
firstcluster-0-0 0/1 Pending 0 0s
firstcluster-0-0 0/1 Pending 0 0s
firstcluster-0-1 0/1 Pending 0 0s
firstcluster-0-1 0/1 Pending 0 0s
In the AKO logs, you will find messages like the following:
2025-09-01T09:23:13Z DEBUG controller.AerospikeCluster Check statefulSet pod running and ready {"aerospikecluster": {"name":"firstcluster","namespace":"aerospike"}, "pod": "firstcluster-0-0"}
2025-09-01T09:23:13Z ERROR controller.AerospikeCluster Failed to wait for statefulset to be ready {"aerospikecluster": {"name":"firstcluster","namespace":"aerospike"}, "STS": {"name":"firstcluster-0","namespace":"aerospike"}, "error": "statefulSet pod firstcluster-0-0 failed: pod firstcluster-0-0 is in unschedulable state and reason is 0/23 nodes are available: 20 node(s) didn't match Pod's node affinity/selector. preemption: 0/23 nodes are available: 23 Preemption is not helpful for scheduling."}
Expected Behavior
AKO should not immediately delete Unschedulable
pods. Instead, it should wait for a configurable timeout period. This would allow a cluster autoscaler to observe the pending pods and provision the necessary node capacity. Once new nodes are available, the pods can be scheduled successfully.