Skip to content

AKO pod reconciliation loop interferes with cluster autoscaling #400

@rodrigorfk

Description

@rodrigorfk

Problem

The Aerospike Kubernetes Operator (AKO) repeatedly deletes and recreates pods flagged as Unschedulable by the Kubernetes scheduler. This behavior prevents cluster autoscalers, such as Karpenter, from provisioning new nodes, leading to an infinite loop of pod creation and termination.

Context

In a cost-efficient cluster design, nodes are provisioned on demand rather than pre-provisioned as warm spare capacity. Cluster autoscalers like Karpenter or the Cluster Autoscaler are responsible for adding new nodes when they detect pods in a pending state. Karpenter, for example, typically provisions new nodes only after the Kubernetes scheduler marks a pod as Unschedulable, allowing a configurable --batch-idle-duration to consolidate multiple pending pods before acting.

Current Behavior

The AKO's reconciliation logic proactively deletes Aerospike pods almost immediately after the Kubernetes scheduler marks them as Unschedulable. This happens before a cluster autoscaler has sufficient time to observe the pending pod and provision new capacity. As a result, the autoscaler never sees a stable Unschedulable pod to trigger a scale-up event. This creates a destructive feedback loop where AKO continuously deletes pods, which are then immediately recreated by the StatefulSet controller, only to be deleted again by AKO.

Details

When creating a new Aerospike clusters, pods are created and deleted endlessly as following:

NAME               READY   STATUS    RESTARTS   AGE
firstcluster-0-0   0/1     Pending   0          0s
firstcluster-0-0   0/1     Pending   0          0s
firstcluster-0-2   0/1     Pending   0          0s
firstcluster-0-1   0/1     Pending   0          0s
firstcluster-0-2   0/1     Pending   0          0s
firstcluster-0-1   0/1     Pending   0          0s
firstcluster-0-2   0/1     Terminating   0          4s
firstcluster-0-2   0/1     Terminating   0          4s
firstcluster-0-1   0/1     Terminating   0          4s
firstcluster-0-1   0/1     Terminating   0          4s
firstcluster-0-0   0/1     Terminating   0          4s
firstcluster-0-0   0/1     Terminating   0          4s
firstcluster-0-2   0/1     Pending       0          0s
firstcluster-0-2   0/1     Pending       0          0s
firstcluster-0-0   0/1     Pending       0          0s
firstcluster-0-0   0/1     Pending       0          0s
firstcluster-0-1   0/1     Pending       0          0s
firstcluster-0-1   0/1     Pending       0          0s
firstcluster-0-2   0/1     Terminating   0          0s
firstcluster-0-2   0/1     Terminating   0          0s
firstcluster-0-1   0/1     Terminating   0          0s
firstcluster-0-1   0/1     Terminating   0          0s
firstcluster-0-0   0/1     Terminating   0          0s
firstcluster-0-0   0/1     Terminating   0          0s
firstcluster-0-2   0/1     Pending       0          0s
firstcluster-0-2   0/1     Pending       0          0s
firstcluster-0-0   0/1     Pending       0          0s
firstcluster-0-0   0/1     Pending       0          0s
firstcluster-0-1   0/1     Pending       0          0s
firstcluster-0-1   0/1     Pending       0          0s

In the AKO logs, you will find messages like the following:

2025-09-01T09:23:13Z	DEBUG	controller.AerospikeCluster	Check statefulSet pod running and ready	{"aerospikecluster": {"name":"firstcluster","namespace":"aerospike"}, "pod": "firstcluster-0-0"}
2025-09-01T09:23:13Z	ERROR	controller.AerospikeCluster	Failed to wait for statefulset to be ready	{"aerospikecluster": {"name":"firstcluster","namespace":"aerospike"}, "STS": {"name":"firstcluster-0","namespace":"aerospike"}, "error": "statefulSet pod firstcluster-0-0 failed: pod firstcluster-0-0 is in unschedulable state and reason is 0/23 nodes are available: 20 node(s) didn't match Pod's node affinity/selector. preemption: 0/23 nodes are available: 23 Preemption is not helpful for scheduling."}

Expected Behavior

AKO should not immediately delete Unschedulable pods. Instead, it should wait for a configurable timeout period. This would allow a cluster autoscaler to observe the pending pods and provision the necessary node capacity. Once new nodes are available, the pods can be scheduled successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions