Scale and capacity see history edit this page

Talks about: , , , , , and

Size and tune JaaS for the snippet workload you run. This page covers how a single replica’s CPU and memory scale with evaluation load, what adding replicas does (and doesn’t) buy you, the knobs that govern throughput, and the metrics that tell you when you’re at capacity.

Sizing a replica

A replica’s resource use is driven by two things: Jsonnet evaluation (CPU) and the per-fetch byte budget plus eval working set (memory).

CPU is evaluation. Each in-flight evaluation pins roughly one CPU for its working set, and go-jsonnet has no mid-evaluation cancellation — once an evaluation starts it runs to completion or to its memory ceiling, even after --evaluation-timeout (default 5s) fires on the parent. Provision CPU for the worst-case concurrent eval load, not the average.

Memory has a bounded worst case. Resident memory during a publish is approximately:

concurrent fetches × fetch byte budget  +  concurrent evals × per-eval working set

The fetch byte budget is bounded by internal caps in the source fetcher (these are not flags — they are fixed constants, not operator-tunable):

CapValueBounds
MaxArchiveBytes64 MiBthe compressed download body
MaxExtractedBytes64 MiBthe sum of every extracted entry held in memory
MaxPerEntryBytes16 MiBany single tar entry
MaxDecompressedBytes512 MiBthe gzip stream’s decompressed output (gzip-bomb defence)

Concurrent downloads are themselves capped (4 in-flight), so the peak ephemeral-storage cost of in-flight downloads is bounded by that count × the 64 MiB archive cap. The render output adds to this: cap it per snippet with --max-artifact-bytes (default 0, disabled — snippets that exceed it fail with ReasonArtifactTooLarge), and watch the jaas_snippet_rendered_bytes histogram to see the real distribution before you pick a value.

The chart defaults (64 MiB memory, 32m CPU) are fine for a quickstart but will OOM under sustained rendering. A reasonable starting point for a production operator:

resources:
  memory: 256Mi
  cpu: 100m

operator:
  storage:
    maxArtifactBytes: 16777216  # 16 MiB; snippets above this fail ReasonArtifactTooLarge

Raise the memory request toward concurrent evals × largest expected artifact plus the fetch budget if you render large dashboards or run many tenants. See Production for the same sizing decision in the context of a full hardening checklist.

Replicas and HA

JaaS uses controller-runtime leader election, on by default whenever --enable-flux-integration is set (--leader-election, true). The lease lives at --leader-election-namespace / --leader-election-id (jaas-operator). The lease is released on shutdown (LeaderElectionReleaseOnCancel), so a rolling update hands off in seconds rather than waiting out the lease duration.

What scales by adding replicas, and what does not:

ComponentRuns onScales with replicas
Jsonnet render HTTP serverevery replicaYes — render throughput is per-pod
Management probes (/live, /ready, /start)every replicaYes
Storage HTTP server (artifact downloads)every replicaYes
Validating admission webhookevery replicaYes
Reconcilers + the manager cache + storage writesleader onlyNo — reconcile is single-leader

Adding replicas raises HTTP render and download throughput and gives the webhook and probes redundancy, but it does not parallelise reconciliation: only the lease-holder reconciles and writes artifacts. The non-leaders stand by to take the lease over on failure.

Because every replica serves artifact downloads but only the leader writes them, multi-replica installs require shared-readable storage: either the s3 backend (--storage-backend=s3 — all replicas read the same bucket, the leader writes) or the local backend on an RWX PVC. The default local backend on an emptyDir or RWO PVC is single-replica only. The backend matrix and the full HA setup live in Storage and HA ; pick a backend there before scaling out.

Tuning throughput

Each knob below names what it controls, its default, when to move it, and the metric that signals the need.

--max-concurrent-evals

Global cap on in-flight Jsonnet evaluations across the HTTP and operator paths. Default is max(GOMAXPROCS × 4, 16). Excess work is rejected — HTTP requests get a 503, the operator requeues with backoff. 0 disables the gate entirely.

--max-stack

Maximum Jsonnet call-stack depth, default 500; 0 falls back to go-jsonnet’s own default. Raise only for legitimately deep recursive templates that hit the limit; keep it bounded so a runaway recursive snippet can’t exhaust the stack.

--evaluation-timeout

Maximum wall-clock time a single evaluation may take, default 5s; 0 disables it. Lower it to fail pathological snippets faster; raise it for genuinely expensive renders. Because evaluation is uncancellable, a timed-out parent leaves the eval goroutine running until it finishes naturally — watch jaas_eval_outstanding_timed_out, which counts exactly those orphans. A sustained non-zero value means your timeout is shorter than some snippet’s real cost.

--rerender-rate and --rerender-burst

The per-snippet token bucket that bounds steady-state re-render churn. Defaults are 60/min (rate, as N/period where period is sec|min|hour) and 120 (burst depth). Raise them for snippets whose upstream sources legitimately change faster than the budget; lower them to throttle a snippet whose update cadence is hammering the operator. The jaas_snippet_rate_limited_total counter (paired with a RateLimited Warning event) tells you when a snippet is hitting its bucket.

spec.interval and --storage-sweep-interval

A snippet’s spec.interval sets how often the operator re-renders it absent a watch event — lengthen it for snippets that rarely change to cut reconcile load. --storage-sweep-interval (default 10m; 0 disables) governs the background GC that clears orphaned .tar.gz.tmp residue; the sweep is off the hot reconcile path, so leave it unless jaas_storage_sweep_failures_total is climbing.

Knowing when you’re at capacity

Watch these saturation signals; each maps to a starter alert in the opt-in PrometheusRule (see Alerting ):

SignalAt capacity whenAlert
jaas_eval_in_flight vs jaas_eval_max_concurrentin-flight pegged at the capJaaSEvalSaturation (ratio ≥ 0.9 for 10m, guarded on the cap being non-zero)
jaas_eval_unavailable_totalclimbing — evals are being rejected(component of eval-saturation diagnosis)
jaas_eval_outstanding_timed_outsustained non-zero — evals outrun the timeoutwatch directly
controller_runtime workqueue depthcan’t drainJaaSControllerWorkqueueDepthHigh (depth > 50 for 15m)
reconcile p95 latencycrosses the SLO ceilingJaaSReconcileLatencyHigh

These signals, the dashboard that plots them, and the reconcile SLOs are detailed under Observability — start with Metrics , Alerting , and the SLOs .

To get a throughput baseline for your hardware before sizing, run the in-repo benchmarks: BenchmarkReconcile_HappyPath / BenchmarkReconcile_Concurrent in internal/operator, the BenchmarkEvaluateAnonymousSnippet_* evaluation benchmarks in internal/eval, and BenchmarkStore_Put / BenchmarkStore_PutLargeArtifact in internal/storage.

Next steps