High reconcile latency see history edit this page

Talks about: , , and

Linked from the JaaSReconcileLatencyHigh alert. Fires when the controller-runtime controller_runtime_reconcile_time_seconds histogram p99 exceeds the configured threshold (default 30s) for the alert window.

Symptom

ALERTS{alertname="JaaSReconcileLatencyHigh", controller="jsonnetsnippet"}

Cause

Reconcile latency is the wall-clock cost of one Reconcile() call. Inside the call, JaaS does (in order):

  1. Get the snippet from the cache.
  2. Run the dependency-cycle BFS (one Get per touched node).
  3. Resolve the source (inline files, or sourceRef → Fetcher: source-CR Get + tarball HTTP fetch + tar extract).
  4. Resolve libraries (one Get per LibraryRef).
  5. Evaluate the snippet via go-jsonnet.
  6. Publish via the storage backend (Put).
  7. Status update + ExternalArtifact upsert.

Slow reconciles are almost always one of:

Diagnosis

# Where is the time going? OTel spans break Reconcile into sub-stages.
# Requires --tracing-endpoint set on the operator.
kubectl --namespace <jaas-ns> get deploy jaas \
  --output jsonpath='{.spec.template.spec.containers[0].args}' \
  | tr ',' '\n' | grep tracing

# Without tracing: the histograms expose enough to triangulate.
kubectl --namespace <jaas-ns> port-forward svc/jaas-metrics 8083:8083 &
curl -s localhost:8083/metrics | grep -E 'reconcile_time|rendered_bytes'

The jaas_snippet_rendered_bytes histogram tells you whether a slow Publisher is the cause (large outputs) vs. a slow Fetcher (small outputs but the histogram is dominated by upstream IO).

For a single suspect snippet, force a reconcile under load and observe:

kubectl annotate jsonnetsnippet <ns>/<name> \
  jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite
kubectl --namespace <jaas-ns> logs deploy/jaas --tail=50 | grep <name>

Remediation

Prevention