Service level objectives

The JaaS operator tracks two service-level objectives. Each is an objective on a service-level indicator (SLI) computed from the metrics , measured over a rolling window. The published dashboard renders them, and the Helm chart can alert on them.
SLO 1 — reconcile availability
Objective: ≥ 99% of syncing reconciles reach Ready=True, over a 28-day
window.
The SLI counts only reconciles that were actually trying to sync — the intentional
Pending and Suspended states are excluded from both halves:
sum(rate(jaas_snippet_reconcile_total{status="True"}[28d]))
/
(
sum(rate(jaas_snippet_reconcile_total{status="True"}[28d]))
+ sum(rate(jaas_snippet_reconcile_total{status="False",reason!~"Suspended|Pending"}[28d]))
)
The error budget is the 1% of reconciles allowed to fail over the window.
Remaining budget, normalised so 1 is full and 0 is exhausted:
(<availability> - 0.99) / (1 - 0.99)
SLO 2 — reconcile latency
Objective: the JsonnetSnippet controller’s p95 reconcile duration stays below 30s over the window.
histogram_quantile(0.95, sum by (le) (
rate(controller_runtime_reconcile_time_seconds_bucket{controller="jsonnetsnippet"}[28d])
))
See the SLOs on the dashboard
The published dashboard opens with an SLO band: current availability against its objective, error budget remaining, p95 latency against its objective, and an availability-versus-objective trend. The operator-internals panels below explain any movement.
The objectives and window are top-level arguments, so you set them per environment
when you render the dashboard through a JsonnetSnippet:
spec:
tlas:
datasource: ["prometheus"] # your Prometheus datasource UID
window: ["28d"] # SLO window
availabilityTarget: ["0.99"] # 99%
latencyTarget: ["30"] # seconds
A short window is fine for a demo; a real 28d SLI needs at least that much
Prometheus retention. For long windows, precompute the SLI with a recording rule
and point window at the recorded series instead of a raw rate(...[28d]).
Alert on the budget
The shipped alerts
already page on the causes of SLO
loss (reconcile errors, latency, eval saturation). To alert on the objective
itself, add an availability-SLO rule through the chart’s extraRules passthrough —
here it fires when recent availability drops below 99%:
operator:
metrics:
prometheusRule:
enabled: true
extraRules:
- alert: JaaSReconcileAvailabilityBelowSLO
expr: |
(
sum(rate(jaas_snippet_reconcile_total{status="True"}[1h]))
/
(
sum(rate(jaas_snippet_reconcile_total{status="True"}[1h]))
+ sum(rate(jaas_snippet_reconcile_total{status="False",reason!~"Suspended|Pending"}[1h]))
)
) < 0.99
for: 1h
labels:
severity: warning
annotations:
summary: JaaS reconcile availability is below its 99% objective
The alert measures a short recent window (1h) so it pages while the budget is
actively burning; the dashboard’s window shows the full SLO window. See
Alerting
for the rest of the catalog.