Self-signed webhook cert renewal failing

Fires when jaas_webhook_cert_renewal_failures_total has increased above the configured per-hour threshold. The Renewer background goroutine rotates the self-signed TLS material every Validity / 3 (typically every few months for a year-long cert). When it can’t, the existing cert keeps working until its natural expiry — at which point the apiserver stops trusting the chain and every JsonnetSnippet admission fails cluster-wide with x509 errors.
Symptom
JaaSWebhookCertRenewalFailingalert is firing (severity: critical).- Operator pod logs carry repeated
Self-signed webhook cert renewal failedwarnings at theRenewer.Intervalcadence. kubectl describe validatingwebhookconfiguration <name>shows acaBundlethat hasn’t rotated since the failures started.- The pod stays
Ready=True— the renewer’s failures don’t gate the readiness probe.
Diagnosis
The most common causes, in order of frequency:
Cause A — RBAC drift on the named VWC
The operator’s ClusterRole pins resourceNames: [<VWCName>] on the validatingwebhookconfigurations patch verb. A chart upgrade that changes operator.webhook.vwcName (or a manual chart edit) leaves the running pod patching a name it no longer has permission for.
kubectl auth can-i patch validatingwebhookconfiguration/<vwc-name> \
--as=system:serviceaccount:<namespace>:<operator-sa>
If the answer is “no”, the chart’s operator-cluster ClusterRole needs the current VWC name added to resourceNames (or the running pod restarted to pick up the new name).
Cause B — VWC renamed out from under the operator
A separate controller (admission policy automation, GitOps drift correction) renamed the VWC. The operator is patching a stale name.
kubectl get validatingwebhookconfigurations \
--selector 'app.kubernetes.io/instance=<release-name>'
If the live name differs from the operator’s --webhook-validating-config-name flag, redeploy the operator with the correct flag or rename the VWC back.
Cause C — CertDir gone read-only
The chart mounts CertDir as an emptyDir by default. A kubectl apply that adds a readOnlyRootFilesystem: true security context or a sidecar that re-mounts the volume can break writes.
kubectl --namespace <ns> exec <operator-pod> -- ls -l /tmp/k8s-webhook-server/serving-certs/
kubectl --namespace <ns> exec <operator-pod> -- touch /tmp/k8s-webhook-server/serving-certs/.write-probe
If the touch fails, the security-context or volume mount needs fixing.
Remediation
Fix the root cause (RBAC, name, or mount).
Roll the operator pod to force a fresh bootstrap of the cert and a re-patch of the VWC:
kubectl --namespace <ns> rollout restart deployment <operator-deployment>The new pod’s bootstrap path goes through the dual-CA union (DD8), so existing replicas stay trusted across the rotation.
Verify renewal is healthy after the bounce — the
jaas_webhook_cert_renewal_failures_totalcounter should stop increasing, and the alert clears once thefor:window passes.
When to consider switching to cert-manager
If the self-signed renewer keeps tripping over your environment’s RBAC story or pod-security policies, the chart supports operator.webhook.certMode: cert-manager — cert-manager handles the rotation and the operator mounts the resulting secret. Trade-off: requires cert-manager installed and an Issuer configured.