Operator pod not ready see history edit this page

Talks about: , , and

Linked from the JaaSOperatorPodDown alert. Fires when at least one jaas pod has been Ready=False for the alert window (default 5m).

Symptom

ALERTS{alertname="JaaSOperatorPodDown", namespace="<jaas-ns>"}

Cause

One of the chart’s two probes is failing:

Frequent causes:

  1. Bind failure on one of the HTTP servers (jsonnet, management, storage, metrics, webhook). The pod logs a clear “listen tcp: address already in use” or similar at boot.
  2. OOMKilled — a pathological snippet allocated a huge object; the kubelet killed the pod. kubectl describe pod shows Last State: Terminated, Reason: OOMKilled.
  3. Image pull failure — registry rate limit, wrong tag, missing pull secret.
  4. TLS cert missing or unreadable when operator.webhook.enabled=true and the cert-manager Secret hasn’t materialized.
  5. Lease contention that leaves no replica as leader (every replica reconnecting to renew, never holding the lease).

Diagnosis

# Which probes are failing? Events tell you.
kubectl --namespace <jaas-ns> describe pod --selector app.kubernetes.io/name=jaas

# Pod logs — the boot sequence prints every listener it binds.
kubectl --namespace <jaas-ns> logs --selector app.kubernetes.io/name=jaas --tail=300

# Compare against the expected listener set.
kubectl --namespace <jaas-ns> get svc --selector app.kubernetes.io/name=jaas

For OOM:

kubectl --namespace <jaas-ns> top pod --selector app.kubernetes.io/name=jaas
kubectl --namespace <jaas-ns> get pod --selector app.kubernetes.io/name=jaas --output yaml \
  | grep -A3 lastState

For lease problems (multi-replica only):

kubectl --namespace <jaas-ns> get lease <release-name>-operator --output yaml

holderIdentity flipping every renewal interval is a sign of network flake or apiserver pressure — the replicas can’t keep the lease stable.

Remediation

Prevention