# Jsonnet-as-a-Service — full documentation

> The complete Jsonnet-as-a-Service documentation (https://jaas.projects.metio.wtf/) concatenated for
> LLMs. For a concise link index see https://jaas.projects.metio.wtf/llms.txt.


# Jsonnet-as-a-Service

**Jsonnet-as-a-Service (JaaS)** evaluates [Jsonnet](https://jsonnet.org/) and
returns JSON. It runs in one of two modes:

- **OCI volume mounting** — the chart mounts your snippets and libraries from OCI
  artifacts as image volumes, and JaaS serves the evaluated JSON over HTTP
  (`GET /jsonnet/<snippet>`). Static content, no custom resources.
- **Flux CR-based** — JaaS watches `JsonnetSnippet` and `JsonnetLibrary` resources
  and publishes the rendered output as a Flux
  [`ExternalArtifact`](https://fluxcd.io/flux/components/source/externalartifacts/)
  that any Flux consumer deploys.

The two modes are mutually exclusive in one chart release;
[Installation](/installation/kubernetes/) covers choosing one.

## What teams build with it

- **Grafana dashboards with grafana-operator.** Author dashboards in Jsonnet
  (grafonnet), let JaaS render them, and have the
  [grafana-operator](https://grafana.github.io/grafana-operator/) reconcile the
  result into Grafana. See [Grafana dashboards](/tutorials/grafana-dashboards/).
- **Kubernetes manifests with stageset-controller.** Render manifests from
  Jsonnet and roll them out in ordered, gated stages with
  [stageset-controller](https://stageset.projects.metio.wtf/). See
  [Deploying manifests with StageSet](/tutorials/deploying-manifests/).

Both build on the same core: a snippet renders to an `ExternalArtifact`, and a
downstream controller consumes it. JaaS only renders — what happens to the JSON is
the consumer's concern, documented on that consumer's own site.

## Where to start

- [Quickstart](/tutorials/quickstart/) — from a Helm install to a published
  artifact in a few steps.
- [Tutorials](/tutorials/) — the two integrations above, plus running JaaS as a
  cluster-free local renderer.
- [Usage](/usage/) — one page per feature, for both the HTTP renderer and the
  operator.
- [Installation](/installation/) — Helm install, production hardening, and the
  full configuration reference.
- [API reference](/api/) — every field of `JsonnetSnippet`, `JsonnetLibrary`, and
  the `ExternalArtifact` output contract.
- [Runbooks](/runbooks/) — symptom, cause, and remediation for every
  Ready-condition reason.

## Project

- Source, releases, and the container image: [github.com/metio/jaas](https://github.com/metio/jaas)
- Helm chart: [`oci://ghcr.io/metio/helm-charts/jaas`](https://github.com/metio/helm-charts/tree/main/charts/jaas)


---

# Installation

Source: https://jaas.projects.metio.wtf/installation/


Install JaaS with [Helm](https://helm.sh/) in one of two mutually exclusive
shapes — the stateless HTTP renderer or the Flux operator — then harden it for
production and operate it day to day. The
[configuration reference](/installation/configuration/) lists every flag and its
chart value.


---

# Configuration reference

Source: https://jaas.projects.metio.wtf/installation/configuration/


Every JaaS flag is listed here with its default and a one-line description. Run
`jaas --help` to see the same list at runtime. The tables on this page are
generated from the binary's own flag definitions, so they never drift from the
runtime contract.

The Helm chart exposes most flags under `arguments.*`; operator-specific flags
are under `operator.*`. The full set of chart values is in the
[Helm chart values](/installation/helm-values/) reference.

## Jsonnet server

The Jsonnet server evaluates snippets and returns JSON. It binds on
`--listen-address:--port` by default.

{{< flag-table group="Jsonnet server" >}}

## Management server

The management server exposes the three Kubernetes probe endpoints. It binds on
`--management-listen-address:--management-port`.

{{< flag-table group="Management server" >}}

Endpoints: `GET /start` (startup probe), `GET /ready` (readiness probe),
`GET /live` (liveness probe). Startup and readiness return `503` with a
`{"status":"…"}` JSON body when the server is not yet ready. Liveness is an
unconditional `200`.

## Snippets and libraries

Flags for declaring the Jsonnet files the server serves.

{{< flag-table group="Snippets and libraries" >}}

Snippet name resolution uses Go's `os.OpenRoot`, which rejects `..` traversal
and symlinks that escape the configured directory. This is security-critical;
see [Evaluation and security](/usage/evaluation-and-security/).

## External variables

{{< flag-table group="External variables" >}}

**Environment variable alternative:** set `JAAS_EXT_VAR_<NAME>=<VALUE>` to
expose `<NAME>` as an external variable. The `--ext-var` flag overrides the env
mechanism on key conflict. See
[External variables and TLAs](/usage/external-variables-and-tlas/) for usage
examples.

## Evaluation limits

{{< flag-table group="Evaluation limits" >}}

`--evaluation-timeout` fires the HTTP response but does not terminate the
underlying go-jsonnet call — the evaluation continues consuming CPU until it
finishes naturally. Size container resources accordingly and use
`--max-concurrent-evals` to bound worst-case goroutine pile-up. See
[Evaluation and security](/usage/evaluation-and-security/) for the full
discussion.

## Lifecycle

{{< flag-table group="Lifecycle" >}}

## Operator (Flux integration)

The following flags are only active when `--enable-flux-integration` is set.

{{< flag-table group="Operator (Flux integration)" >}}

**Environment variable:** `JAAS_WATCH_NAMESPACES` — comma-separated namespace
list. Superseded by `--watch-namespaces` when both are set.

## Storage server (local and S3)

The storage server is the HTTP file server that downstream Flux consumers fetch
artifacts from. It is started only when `--enable-flux-integration` is set.

{{< flag-table group="Storage server (local and S3)" >}}

### S3 flags

Active only when `--storage-backend=s3`.

{{< flag-table group="S3 flags" >}}

## Webhook (TLS provisioning)

Active only when `--enable-webhook` is set (which also requires
`--enable-flux-integration`).

{{< flag-table group="Webhook (TLS provisioning)" >}}

See [Admission webhook](/usage/admission-webhook/) for the full `failurePolicy`
trade-off and cert rotation details.

## Leader election

{{< flag-table group="Leader election" >}}

## Observability

### Metrics

{{< flag-table group="Metrics" >}}

### Tracing

{{< flag-table group="Tracing" >}}

## Logging and lifecycle

{{< flag-table group="Logging and lifecycle" >}}


---

# Helm chart values

Source: https://jaas.projects.metio.wtf/installation/helm-values/


The jaas Helm chart lives in the
[metio/helm-charts](https://github.com/metio/helm-charts/tree/main/charts/jaas)
monorepo and is published at `oci://ghcr.io/metio/helm-charts/jaas`. The tables
below are generated from each chart's `values.yaml`, so they track the chart's
current values rather than a hand-maintained copy.

For how the values map onto the binary's runtime behaviour, see the
[Configuration reference](/installation/configuration/) — every `arguments.*`
value drives the corresponding `--flag`.

## jaas chart

{{< helm-values data="helm-values" >}}

## joi library chart

The [joi](https://github.com/metio/helm-charts/tree/main/charts/joi) chart
publishes [Jsonnet OCI Images](https://github.com/metio/jsonnet-oci-images) as
`JsonnetLibrary` + `OCIRepository` pairs, so snippets can import vendored
libraries (grafonnet, k8s-libsonnet, …) without bundling them. Deploy it
alongside jaas when snippets reference shared libraries.

{{< helm-values data="joi-values" >}}


---

# Kubernetes

Source: https://jaas.projects.metio.wtf/installation/kubernetes/


JaaS ships as a container image at `ghcr.io/metio/jaas:latest` and as a Helm
chart at `oci://ghcr.io/metio/helm-charts/jaas`. Pre-built binaries for Linux,
macOS, and Windows are attached to each GitHub release for operators who prefer
to run the binary directly.

## Prerequisites

- A [Kubernetes](https://kubernetes.io/) cluster, **v1.28 or later**, with
  `kubectl` configured against it.
- [Helm](https://helm.sh/) **v3.14 or later** — OCI chart support is required to
  pull the chart from `ghcr.io`.

The Flux CR-based mode (below) additionally needs:

- [Flux](https://fluxcd.io/) **v2.7.0 or later** in the cluster — the
  `ExternalArtifact` CRD that JaaS publishes lands in v2.7.0.
- [cert-manager](https://cert-manager.io/) — **only** if you set the admission
  webhook to `cert-manager` mode. The chart defaults to `self-signed`, which
  provisions and rotates the webhook's TLS in-process and needs no cert-manager;
  see [Production](/installation/production/#admission-webhook-tls) for the
  trade-off.

The OCI volume-mounting mode needs neither Flux nor cert-manager.

## Install and update

`helm upgrade --install` is idempotent: the same command installs the chart the
first time and applies your changes on every subsequent run, so it's the only
deploy command you need. To update later, re-run it with an updated `--values`
file or `--set` flags.

The chart runs JaaS in one of two mutually exclusive modes in a single release.
Pick the one that matches your use case; you **cannot** combine them in one
release — the chart's pre-install preflight rejects the combination.

### Mode 1 — OCI volume mounting (HTTP renderer)

JaaS evaluates Jsonnet snippets on demand and returns JSON over HTTP. Snippets and
libraries are mounted into the pod from OCI artifacts as image volumes (the
`snippets` and `additionalLibraries` chart values), read straight from a registry.
There are no CRDs, no leader election, and no persistent storage — the pod is
stateless.

```shell
helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \
  --namespace jaas-system --create-namespace \
  --values my-values.yaml \
  --wait
```

A minimal `my-values.yaml` — `snippets` and `additionalLibraries` are maps of
`name: image-reference`:

```yaml
# Snippets to render — a map of name: image. The name becomes the URL path, so
# this snippet is served at GET /jsonnet/dashboards.
snippets:
  dashboards: ghcr.io/my-org/my-dashboards:latest

# Well-known libraries have a built-in toggle — enable grafonnet with one flag
# (the chart already knows its JOI image). docsonnet and xtd work the same way.
libraries:
  grafonnet:
    enabled: true

# additionalLibraries mounts any OTHER library image — a JOI library without a
# built-in toggle, or your own private bundle. The map KEY is the directory the
# image mounts under and that the renderer adds to its import search path
# (`--library-path /srv/libraries/<key>`); it must be unique. The entry below
# mounts ghcr.io/acme/jsonnet-acme-lib at /srv/libraries/acme.
additionalLibraries:
  acme: ghcr.io/acme/jsonnet-acme-lib:latest
```

The chart mounts each image read-only and wires the renderer for you. The
`dashboards` snippet is then reachable at `GET /jsonnet/dashboards`. A library is
imported by the path it resolves to under its search directory — for a
jb-vendored image like grafonnet, the full vendor path baked into it:

```jsonnet
import 'github.com/grafana/grafonnet/gen/grafonnet-latest/main.libsonnet'
```

The Jsonnet HTTP server listens on port `8080` (configurable via `ports.http`).

### Mode 2 — Flux CR-based (operator)

JaaS watches `JsonnetSnippet` and `JsonnetLibrary` CRs, evaluates snippets, and
publishes the results as `ExternalArtifact` resources. Downstream Flux consumers
(kustomize-controller, helm-controller, stageset-controller) fetch the rendered
JSON from the artifact server.

```shell
helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \
  --namespace jaas-system --create-namespace \
  --set operator.enabled=true \
  --set operator.storage.persistence.enabled=true \
  --wait
```

A minimal values snippet for the operator shape:

```yaml
operator:
  enabled: true
  storage:
    # local backend with a PVC — enough for a single-replica install. For
    # multi-replica HA, switch to backend: s3 (see /installation/production/).
    backend: local
    persistence:
      enabled: true
      size: 10Gi
```

The operator publishes artifacts at the URL configured via
`operator.storage.baseURL`. Left empty, it defaults to the in-cluster Service
DNS name (`http://jaas-storage.<namespace>.svc.cluster.local:<port>`), which is
correct when downstream Flux consumers fetch artifacts from inside the cluster.
Set it explicitly only when consumers dereference the artifacts through an
Ingress or external hostname.

### How CRDs are handled

The chart ships its CRDs (`JsonnetSnippet`, `JsonnetLibrary`) inside the regular
templates — not Helm's special `crds/` directory — so a `helm upgrade --install`
applies schema changes like any other resource, governed by `crds.create`
(default `true`). The CRDs carry `helm.sh/resource-policy: keep`, so a
`helm uninstall` leaves them — and your existing resources — in place; remove them
by hand only if you really mean to.

Check [MIGRATIONS.md](https://github.com/metio/jaas/blob/main/MIGRATIONS.md)
before upgrading across a release that changes an immutable field such as a
Deployment's `spec.selector.matchLabels` — those require a manual
`kubectl --namespace jaas-system delete deploy jaas` first.

If you manage CRDs out of band, the raw definitions are published in the
repository under `config/crd/bases/` and can be applied with
`kubectl apply --server-side -f`.

## Customize

Every setting the chart exposes — the two modes above, storage backend, leader
election, the admission webhook, NetworkPolicy, service mesh, metrics, and the
rest — is a Helm value. Two references cover them:

- [Helm chart values](/installation/helm-values/) — the full values reference,
  generated from the chart's own schema.
- [Configuration reference](/installation/configuration/) — every binary flag and
  the chart value that drives it.

For production sizing — S3 storage, multi-replica HA, observability, and webhook
hardening — see the [Production guide](/installation/production/).

## Verify

For the operator shape, confirm the Deployment is available and the CRDs are
registered:

```shell
kubectl --namespace jaas-system rollout status deploy/jaas
kubectl get crd jsonnetsnippets.jaas.metio.wtf jsonnetlibraries.jaas.metio.wtf
```

For the HTTP renderer, confirm the pod is ready and the endpoint answers:

```shell
kubectl --namespace jaas-system get pods --selector app.kubernetes.io/name=jaas
kubectl --namespace jaas-system port-forward svc/jaas 8080:8080 &
curl http://localhost:8080/jsonnet/my-dashboard
```

## Next steps

- [Quickstart tutorial](/tutorials/quickstart/) — five steps from a Helm install
  to a published artifact.
- [Production hardening](/installation/production/) — storage, observability, the
  admission webhook, and multi-replica HA.


---

# Operations

Source: https://jaas.projects.metio.wtf/installation/operations/


Day-two operations for a running JaaS install. Initial install and hardening
decisions are in [Kubernetes](/installation/kubernetes/) and
[Production](/installation/production/).

## Graceful shutdown and drain

When Kubernetes sends `SIGTERM`, JaaS executes a two-phase shutdown to avoid
dropping in-flight requests:

1. The readiness probe flips to `false` (`503` on `/ready`). Kubernetes
   endpoint controllers begin deregistering the pod from Services.
2. JaaS waits for `--shutdown-delay` (default `5s`) before closing its
   listeners. This window lets the endpoint propagation complete so no new
   traffic arrives after the server closes.
3. After the delay, the servers shut down gracefully with a 30-second
   `context.WithTimeout`. The operator goroutine is also cancelled and awaited
   within the same 30-second window.

The distroless runtime image has no `sleep` binary, so the drain delay is
implemented in the binary rather than via a `preStop` hook. A second
`SIGTERM` (or `SIGINT`) during the drain cuts the wait short.

To disable the drain (zero delay):

```shell
--shutdown-delay 0
```

The chart value is `arguments.shutdownDelay`.

## Leader election during rolling updates

In operator mode, leader election is on by default (`--leader-election`,
`operator.leaderElection.enabled: true`). The chart sets
`LeaderElectionReleaseOnCancel: true`, so when the old pod receives `SIGTERM` it
releases the lease immediately instead of waiting out the 15-second
`LeaseDuration`. The new pod picks up the lease within milliseconds.

Snippets that were `Ready=True` before the restart stay in that condition via
cached state. A new pod that takes over as leader reconciles them on the next
watch event. If snippets remain degraded for more than a few seconds after a
restart, check the [operator-watch-silent](/runbooks/operator-watch-silent/)
runbook — it diagnoses the case where the operator's own ClusterRole is missing
a verb so controller-runtime's informer silently fails to start.

To force-restart the operator (e.g. after an upgrade):

```shell
kubectl rollout restart deployment/jaas --namespace jaas-system
```

## Artifact retention and storage GC

Three independent mechanisms govern how long artifacts stay on disk (or in S3).
Full storage backend configuration is in [Storage and HA](/usage/storage-and-ha/).

### GC grace window (`--artifact-gc-grace`, default `5m`)

When a snippet is re-rendered, the superseded revision drops out of the keep-set
but remains fetchable for `--artifact-gc-grace` after supersession. This closes
the pin→fetch race in which a Flux consumer reads `status.artifact.url` a moment
before the operator GC-prunes the old tarball. The window survives operator
restarts — supersession time is derived from on-disk storage metadata, not from
in-memory state.

Set `0` to restore eager pruning (one revision at a time, matching stock Flux
source-controller semantics). Tune lower when storage capacity is tight and all
consumers are in-cluster.

### History retention (`spec.history`, default `1`, max `50`)

Per-snippet deliberate retention for rollback and blue-green flows. A downstream
consumer can pin to a specific `sha256` digest indefinitely as long as that
revision is within the history keep-set. This is separate from the GC grace
window — it is explicit operator intent, not a race-protection mechanism.

### Orphaned `.tmp` sweep (`--storage-sweep-interval`, `--storage-sweep-max-tmp-age`)

A `Put` that dies between writing the tempfile and the atomic rename leaves a
`<rev>.tar.gz.tmp` residue. The sweep goroutine runs on a ticker (default every
`10m`) and removes `.tmp` files older than `--storage-sweep-max-tmp-age`
(default `30m`). The age floor ensures live writers are never raced.

Set `--storage-sweep-interval 0` to disable the sweep entirely. The
`jaas_storage_sweep_failures_total` Prometheus counter signals failing sweep
passes.

## Finalizer teardown and the WithdrawForced safety valve

Every `JsonnetSnippet` holds a finalizer (`jaas.metio.wtf/finalizer`) that
blocks Kubernetes garbage collection until the operator successfully calls
`Publisher.Withdraw` to remove the artifact from storage. If the backend is
permanently unavailable (S3 down, RBAC revoked, bucket deleted), the finalizer
would otherwise hold the snippet — and by extension its namespace — in
`Terminating` forever.

`--max-withdraw-wait` (default `1h`) bounds how long the finalizer can hold.
Once the deadline passes, the operator:

1. Emits a `Warning WithdrawForced` Kubernetes Event on the snippet.
2. Drops the finalizer so the snippet can be garbage-collected.

The trade-off is a possible orphan tarball in storage. Recover it using the
[storage-recovery](/runbooks/storage-recovery/) runbook.

Adjust the bound with the chart value `operator.maxWithdrawWait`. Lower it in
environments where namespace teardown latency is critical; raise it (or remove
the concern by fixing the backend) in environments where artifact-safety is
paramount.

## Upgrades

Calendar-based releases ship every Monday. The chart version and the binary
version advance together.

```shell
helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \
  --namespace jaas-system \
  --values my-values.yaml \
  --wait --timeout 5m
```

The chart ships CRDs under `templates/` so `helm upgrade --install` applies schema
changes automatically.

**Before each upgrade**, read
[MIGRATIONS.md](https://github.com/metio/jaas/blob/main/MIGRATIONS.md):

- Releases that change `spec.selector.matchLabels` on the Deployment require a
  manual `kubectl delete deployment/jaas` first — that field is immutable and
  `helm upgrade --install` will fail otherwise.
- The pre-delete cleanup Job (`operator.cleanupOnDelete.enabled: true`, the
  default) runs on `helm uninstall` and drops every snippet's finalizer so
  `ExternalArtifact` resources are unwound before the operator pod is removed.
  If the cleanup Job hangs, check `operator.cleanupOnDelete.kubectlTimeout`
  (default `2m`) and the backend health.

## Monitoring operational health

Key signals to watch:

- `jaas_storage_sweep_failures_total` — non-zero means the sweep goroutine is
  erroring; investigate storage backend health.
- `jaas_snippet_reconcile_total{status!="Synced"}` — elevated rate means
  snippets are failing to render; cross-reference with the `reason` label and
  the relevant runbook.
- `JaaSControllerWorkqueueDepthHigh` PrometheusRule alert — workqueue is backing
  up; the operator cannot keep up with the reconcile rate.
- `/ready` probe on the management port (default `8081`) — `503` after startup
  means the manager has not yet been elected or its cache has not synced.

All metrics are documented in [Observability](/usage/observability/). All shipped
alerts link to [Runbooks](/runbooks/).

## Next steps

- [Configuration reference](/installation/configuration/) — the full flag
  list with defaults and chart value equivalents.
- [Runbooks](/runbooks/) — incident response procedures keyed to each
  `Ready` condition `Reason`.


---

# Production

Source: https://jaas.projects.metio.wtf/installation/production/


The chart's defaults are safe for an initial install but not optimised for
sustained production workloads. Work through these decisions before exposing JaaS
to real traffic. Each links to the detailed guide.

## 1. Pick a storage backend

The single largest decision. Artifacts must survive pod restarts and, for HA,
be readable by every replica simultaneously.

| Backend | Persistence | Multi-replica HA |
|---|---|---|
| `local` + emptyDir (chart default) | No | No |
| `local` + RWO PVC | Yes | No — single replica only |
| `local` + RWX PVC | Yes | Yes — requires RWX storage class |
| `s3` | Yes | Yes — leader writes, all replicas read |

For cloud installs, `s3` (AWS S3, MinIO, Ceph RGW, GCS S3-compat API) is the
recommended backend. Pair it with leader election (on by default) so only the
lease-holder writes. For on-prem, a PVC with the access mode your storage class
supports is the practical path.

Full configuration options and artifact retention are covered in
[Storage and HA](/usage/storage-and-ha/) — including the garbage-collection grace
period (`--artifact-gc-grace`) that keeps a just-superseded revision fetchable for
a short window, so a consumer that read `status.artifact` moments before pruning
doesn't 404 on the revision it pinned.

Minimal S3 values (IRSA on EKS):

```yaml
operator:
  enabled: true
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/jaas-operator
  storage:
    backend: s3
    s3:
      endpoint: s3.amazonaws.com
      bucket: my-jaas-artifacts
      prefix: prod
      region: eu-west-1
      useSSL: true
      # Leave accessKey/secretKey empty — IAM role via SA annotation.
```

## 2. Size CPU and memory

The chart defaults (64 MiB memory, 32m CPU) are fine for a quickstart but will
OOM under sustained snippet rendering. Each in-flight evaluation is essentially
uncancellable mid-flight — go-jsonnet has no mid-evaluation cancellation — so
CPU and memory limits must accommodate the worst-case concurrent eval load.

Set `--max-artifact-bytes` to cap the rendered output size per snippet so a
runaway template can't allocate unbounded memory before the timeout fires.

See [Evaluation and security](/usage/evaluation-and-security/) for the
concurrent-eval cap, timeout defaults, and how to tune them.

```yaml
resources:
  memory: 256Mi
  cpu: 100m

operator:
  storage:
    maxArtifactBytes: 16777216  # 16 MiB; fails with ReasonArtifactTooLarge
```

## 3. Enable observability

The chart ships a metrics endpoint (on by default at port `8083`), an opt-in
`ServiceMonitor`, and an opt-in `PrometheusRule` with a starter alert set. Turn
them on and wire the Prometheus selector labels before deploying:

```yaml
operator:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      labels:
        release: kube-prom          # match your Prometheus's serviceMonitorSelector
    prometheusRule:
      enabled: true
      labels:
        release: kube-prom          # match your Prometheus's ruleSelector
      extraAlertLabels:
        team: platform              # Alertmanager routing label
```

`serviceMonitor.labels`, `prometheusRule.labels`, and
`prometheusRule.extraAlertLabels` are three distinct label knobs:
`serviceMonitor.labels` and `prometheusRule.labels` control which Prometheus
instance picks up each CRD object; `extraAlertLabels` adds routing labels to
individual alerts (for Alertmanager), not to the rule object.

The shipped alert set and all custom JaaS metrics are documented in
[Observability](/usage/observability/).

## 4. Enable the admission webhook {#admission-webhook-tls}

The webhook rejects spec invariant violations — ext-var key collisions, library
alias shadowing, import cycles — at `kubectl apply` time instead of at
reconcile time. Pick a cert mode:

```yaml
# Option A: cert-manager (recommended when cert-manager is installed)
operator:
  webhook:
    enabled: true
    certMode: cert-manager
    certManager:
      enabled: true
      issuerRef:
        kind: ClusterIssuer
        name: letsencrypt-prod

# Option B: self-signed (no cert-manager required)
operator:
  webhook:
    enabled: true
    certMode: self-signed
```

The default `failurePolicy: Fail` blocks every `JsonnetSnippet` create/update
cluster-wide when the webhook is unavailable. During a rolling update the window
is typically under five seconds (leader election releases the lease on
SIGTERM). If your GitOps tooling cannot tolerate that, scope the webhook via
`operator.webhook.namespaceSelector` or `operator.webhook.objectSelector`, or
switch to `failurePolicy: Ignore` and rely on the reconciler-side fallback.

Full cert provisioning and failurePolicy trade-offs are covered in
[Admission webhook](/usage/admission-webhook/).

## 5. Lock down tenant RBAC

Every `JsonnetSnippet` runs impersonated as its `spec.serviceAccountName` (or
the `--default-service-account` fallback). The operator's own ServiceAccount
only needs `serviceaccounts/token: create` — every other API call (library
reads, source fetches, `ExternalArtifact` writes) is done under the tenant SA's
RBAC, so a compromised snippet can only reach what its SA is allowed to.

Minimum per-tenant `Role`:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: <tenant-namespace>
  name: jaas-tenant
rules:
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [externalartifacts]
    verbs: [get, create, update, patch]
  - apiGroups: [jaas.metio.wtf]
    resources: [jsonnetlibraries]
    verbs: [get, list]
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [gitrepositories, ocirepositories, buckets, externalartifacts]
    verbs: [get]
```

When `operator.watchNamespaces` is set, the chart automatically switches from a
ClusterRoleBinding to per-namespace RoleBindings. Full RBAC layout and
NetworkPolicy notes for `spec.sourceRef` fetches are in
[Tenancy and RBAC](/usage/tenancy-and-rbac/).

## 6. Plan for upgrades and disaster recovery

Calendar-based releases run every Monday. Chart upgrades are `helm upgrade
--install`; the chart ships CRDs under `templates/` so schema changes apply
automatically. Read
[MIGRATIONS.md](https://github.com/metio/jaas/blob/main/MIGRATIONS.md) before
each upgrade — releases that change immutable `spec.selector.matchLabels` fields
require a manual `kubectl delete deploy/jaas` first.

Three runbooks to bookmark before go-live:

- [storage-recovery](/runbooks/storage-recovery/) — PVC loss, S3 outages,
  disk-full, downstream 404s.
- [rbacdenied](/runbooks/rbacdenied/) — tenant SA missing a verb, ExternalArtifact
  write forbidden, Flux source CRD not installed.
- [operator-watch-silent](/runbooks/operator-watch-silent/) — the one failure
  mode JaaS cannot surface in snippet status (operator's own ClusterRole missing
  a verb so controller-runtime's informer silently fails).

## A complete production values.yaml

```yaml
replicas:
  min: 2
  max: 5

resources:
  memory: 256Mi
  cpu: 100m

image:
  pullPolicy: IfNotPresent

namespace:
  create: true
  podSecurity:
    enforce: restricted

operator:
  enabled: true
  defaultServiceAccount: ""   # force per-snippet spec.serviceAccountName

  serviceAccount:
    create: true
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/jaas-operator  # TODO

  storage:
    backend: s3
    s3:
      endpoint: s3.amazonaws.com  # TODO
      bucket: my-jaas-artifacts   # TODO
      prefix: prod
      region: eu-west-1           # TODO
    maxArtifactBytes: 16777216

  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      labels:
        release: kube-prom        # TODO
    prometheusRule:
      enabled: true
      labels:
        release: kube-prom        # TODO
      extraAlertLabels:
        team: platform            # TODO

  webhook:
    enabled: true
    certMode: cert-manager
    certManager:
      issuerRef:
        kind: ClusterIssuer
        name: letsencrypt-prod    # TODO

  leaderElection:
    enabled: true

  cleanupOnDelete:
    enabled: true
```

Apply with:

```shell
helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \
  --namespace jaas-system --create-namespace \
  --values production-values.yaml \
  --wait --timeout 5m
```

## Next steps

- [Operations](/installation/operations/) — day-two tasks: rolling restarts,
  storage sweeping, finalizer teardown.
- [Configuration reference](/installation/configuration/) — every flag and default.
- [Runbooks](/runbooks/) — incident response.


---

# Tutorials

Source: https://jaas.projects.metio.wtf/tutorials/


Complete, copy-paste paths from nothing to a working result. The two integration
tutorials cover the JaaS side — authoring a snippet and publishing the artifact;
the consuming side (how grafana-operator reconciles a dashboard, how StageSet
gates a rollout) is linked from each tutorial rather than repeated here.


---

# Deploying manifests with StageSet

Source: https://jaas.projects.metio.wtf/tutorials/deploying-manifests/


JaaS pairs with [stageset-controller](https://stageset.projects.metio.wtf/) to
deploy Kubernetes manifests as code: you author the manifests in Jsonnet, the
JaaS operator renders and publishes them as a Flux `ExternalArtifact`, and
stageset-controller rolls that artifact out across ordered, gated stages.

This tutorial covers the JaaS side — authoring the manifests with top-level
arguments and external variables, and publishing the rendered JSON. The rollout
side (the `StageSet` resource, its stages, gates, and actions) lives on the
stageset-controller site and is linked at the end.

## Prerequisites

- The JaaS operator installed and a tenant ServiceAccount granted the
  `externalartifacts` write verbs. The [Quickstart](/tutorials/quickstart/)
  covers both.
- stageset-controller installed, if you intend to follow the handoff section and
  roll the manifests out.

This tutorial uses the namespace `default` and the tenant ServiceAccount
`manifests-tenant`.

## Step 1 — Grant the tenant ServiceAccount its verbs

The snippet publishes an `ExternalArtifact`, so the tenant needs the
`externalartifacts` write verbs:

```shell
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: manifests-tenant
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: manifests-tenant
rules:
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [externalartifacts]
    verbs: [get, create, update, patch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: default
  name: manifests-tenant
subjects:
  - kind: ServiceAccount
    name: manifests-tenant
    namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: manifests-tenant
EOF
```

Verify the ServiceAccount and binding:

```shell
kubectl --namespace default get serviceaccount manifests-tenant
kubectl --namespace default get rolebinding manifests-tenant
```

## Step 2 — Author and apply the manifest snippet

The snippet renders a `List` of a `Deployment` and a `Service` from top-level
arguments (`spec.tlas`) and an external variable (`spec.externalVariables`). A
single-value TLA arrives as a string; the snippet parses the replica count with
`std.parseInt`. `spec.output` stays at its default `rendered` so the artifact
carries the evaluated manifest JSON:

```shell
cat <<EOF | kubectl apply -f -
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: web-app
  namespace: default
spec:
  serviceAccountName: manifests-tenant
  output: rendered
  files:
    main.jsonnet: |
      function(name='web', replicas='2')
        local image = std.extVar('image');
        local labels = { 'app.kubernetes.io/name': name };
        {
          apiVersion: 'v1',
          kind: 'List',
          items: [
            {
              apiVersion: 'apps/v1',
              kind: 'Deployment',
              metadata: { name: name, labels: labels },
              spec: {
                replicas: std.parseInt(replicas),
                selector: { matchLabels: labels },
                template: {
                  metadata: { labels: labels },
                  spec: {
                    containers: [{
                      name: name,
                      image: image,
                      ports: [{ containerPort: 8080 }],
                    }],
                  },
                },
              },
            },
            {
              apiVersion: 'v1',
              kind: 'Service',
              metadata: { name: name, labels: labels },
              spec: {
                selector: labels,
                ports: [{ port: 80, targetPort: 8080 }],
              },
            },
          ],
        }
  tlas:
    name: [web]
    replicas: ["3"]
  externalVariables:
    image: "ghcr.io/example/web:1.4.0"
EOF
```

Each `spec.tlas` value is a list, matching the HTTP query-parameter convention:
a single element becomes a string TLA, multiple elements a JSON-encoded array.
External variables seed `std.extVar` lookups.

## Step 3 — Confirm the manifests rendered

```shell
kubectl --namespace default get jsonnetsnippet web-app
# NAME      READY   URL                                                                                     AGE
# web-app   True    http://jaas-storage.jaas-system.svc.cluster.local:8082/default/web-app/<sha256>.tar.gz  5s
```

If `READY` is `False`, describe the snippet — the Ready condition's `Reason` and
`Message` name the cause:

```shell
kubectl --namespace default describe jsonnetsnippet web-app
```

## Step 4 — Inspect the published manifests

Fetch the artifact from a one-shot pod to see the rendered manifests:

```shell
URL=$(kubectl --namespace default get jsonnetsnippet web-app -o jsonpath='{.status.artifactURL}')
kubectl run --rm -i --restart=Never --image=docker.io/curlimages/curl:8.10.1 fetch -- \
    sh -c "curl -fsSL '$URL' | tar -xzO rendered.json"
# {
#    "apiVersion": "v1",
#    "kind": "List",
#    "items": [ { "kind": "Deployment", ... }, { "kind": "Service", ... } ]
# }
```

`rendered.json` is the manifest set a Flux consumer applies to the cluster.

## Handoff: roll the manifests out with StageSet

The published `ExternalArtifact` is now ready for a Flux consumer. A consumer
references it in one of two ways:

- **Directly** — name the `ExternalArtifact` (which shares the snippet's name and
  namespace) in a `sourceRef`.
- **Producer-aware** — name the producing `JsonnetSnippet` and let the consumer
  resolve it to the `ExternalArtifact`. JaaS writes a three-field back-pointer
  (`apiVersion`, `kind`, `name`) under the artifact's `spec.sourceRef` for this,
  which is the contract producer-aware resolvers match on.

stageset-controller consumes the published artifact the producer-aware way and
rolls it out across ordered, gated stages. The `StageSet` resource, its stages,
gates, and actions live on the stageset-controller documentation:

- **stageset-controller producer-aware sources guide:**
  <https://stageset.projects.metio.wtf/usage/producer-aware-sources/>
- **stageset-controller project:** <https://stageset.projects.metio.wtf/>

Follow that guide for the rollout side; it picks up exactly where this tutorial
leaves off — at the published `ExternalArtifact`.

## Where to go next

- [Operator mode](/usage/operator-mode/) — the full operator reference,
  including the `ExternalArtifact` `spec.sourceRef` back-pointer contract that
  producer-aware consumers match on.
- [Snippet sources](/usage/snippet-sources/) — back the manifests with a
  `GitRepository` or OCIRepository instead of inline `spec.files`.


---

# Grafana dashboards

Source: https://jaas.projects.metio.wtf/tutorials/grafana-dashboards/


JaaS pairs with the
[grafana-operator](https://grafana.github.io/grafana-operator/) to manage
Grafana dashboards as code: you author the dashboard in Jsonnet, the JaaS
operator renders it and publishes the dashboard JSON as a Flux
`ExternalArtifact`, and the grafana-operator reconciles that artifact into a live
Grafana instance.

This tutorial covers the JaaS side — authoring the dashboard, importing
grafonnet as a `JsonnetLibrary`, and publishing the rendered JSON. The
grafana-operator side (the `GrafanaDashboard` CR, datasources, folders) lives on
their site and is linked at the end.

## Prerequisites

- The JaaS operator installed and a tenant ServiceAccount granted the
  `externalartifacts` write verbs. The [Quickstart](/tutorials/quickstart/)
  covers both.
- The grafana-operator installed, if you intend to follow the handoff section
  and reconcile the dashboard into Grafana.

This tutorial uses the namespace `default` and the tenant ServiceAccount
`dashboards-tenant`.

## Step 1 — Grant the tenant ServiceAccount its verbs

The snippet imports a `JsonnetLibrary`, so on top of the `externalartifacts`
write verbs the tenant needs `get` on `jsonnetlibraries`:

```shell
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dashboards-tenant
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: dashboards-tenant
rules:
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [externalartifacts]
    verbs: [get, create, update, patch]
  - apiGroups: [jaas.metio.wtf]
    resources: [jsonnetlibraries]
    verbs: [get]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: default
  name: dashboards-tenant
subjects:
  - kind: ServiceAccount
    name: dashboards-tenant
    namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: dashboards-tenant
EOF
```

Verify the ServiceAccount and binding:

```shell
kubectl --namespace default get serviceaccount dashboards-tenant
kubectl --namespace default get rolebinding dashboards-tenant
```

## Step 2 — Publish the dashboard helpers as a JsonnetLibrary

A `JsonnetLibrary` holds reusable `.libsonnet` files that snippets in the same
namespace import by alias. The example below carries a minimal set of dashboard
constructors. In a production setup this is where grafonnet lives — see
[Jsonnet libraries](/usage/jsonnet-libraries/) for serving the full grafonnet
tree from an OCIRepository.

```shell
cat <<EOF | kubectl apply -f -
apiVersion: jaas.metio.wtf/v1
kind: JsonnetLibrary
metadata:
  name: grafana-helpers
  namespace: default
spec:
  files:
    dashboard.libsonnet: |
      {
        new(title): {
          title: title,
          schemaVersion: 38,
          panels: [],
        },
      }
    panel.libsonnet: |
      {
        timeseries(title, expr): {
          type: 'timeseries',
          title: title,
          targets: [{ expr: expr }],
        },
        stat(title, expr): {
          type: 'stat',
          title: title,
          targets: [{ expr: expr }],
        },
      }
EOF
```

Verify the library:

```shell
kubectl --namespace default get jsonnetlibrary grafana-helpers
```

## Step 3 — Author and apply the dashboard snippet

The `JsonnetSnippet` imports the library by the alias declared in
`spec.libraries[*].importPath`, composes a dashboard from its constructors, and
leaves `spec.output` at its default `rendered` so the published artifact carries
the evaluated dashboard JSON:

```shell
cat <<EOF | kubectl apply -f -
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: api-latency
  namespace: default
spec:
  serviceAccountName: dashboards-tenant
  output: rendered
  files:
    main.jsonnet: |
      local dashboard = import 'grafana/dashboard.libsonnet';
      local panel = import 'grafana/panel.libsonnet';
      dashboard.new('API Latency') + {
        panels: [
          panel.timeseries('p99 by route', 'histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))'),
          panel.stat('error rate', 'sum(rate(http_requests_total{code=~"5.."}[5m]))'),
        ],
      }
  libraries:
    - kind: JsonnetLibrary
      name: grafana-helpers
      importPath: grafana
EOF
```

The `importPath: grafana` ties `import 'grafana/dashboard.libsonnet'` to the
`grafana-helpers` library. It defaults to the library's `metadata.name`, so
naming the library `grafana` would let you drop the field. `kind` is always
`JsonnetLibrary`.

## Step 4 — Confirm the dashboard rendered

```shell
kubectl --namespace default get jsonnetsnippet api-latency
# NAME          READY   URL                                                                                         AGE
# api-latency   True    http://jaas-storage.jaas-system.svc.cluster.local:8082/default/api-latency/<sha256>.tar.gz  5s
```

If `READY` is `False`, describe the snippet — the Ready condition's `Reason` and
`Message` name the cause (an RBAC gap on the library, an import alias collision,
or a Jsonnet error):

```shell
kubectl --namespace default describe jsonnetsnippet api-latency
```

## Step 5 — Inspect the published dashboard JSON

Fetch the artifact from a one-shot pod to see the rendered dashboard:

```shell
URL=$(kubectl --namespace default get jsonnetsnippet api-latency -o jsonpath='{.status.artifactURL}')
kubectl run --rm -i --restart=Never --image=docker.io/curlimages/curl:8.10.1 fetch -- \
    sh -c "curl -fsSL '$URL' | tar -xzO rendered.json"
# {
#    "panels": [ ... ],
#    "schemaVersion": 38,
#    "title": "API Latency"
# }
```

`rendered.json` is the Grafana dashboard model — the exact JSON the
grafana-operator hands to Grafana's dashboard API.

## Use real grafonnet instead of the toy helpers

`grafana-helpers` kept this tutorial self-contained, but in production you import
the real [grafonnet](https://github.com/grafana/grafonnet) library from a JOI
image rather than hand-rolling constructors. Install it as a `JsonnetLibrary`
with the [`joi` Helm chart](https://github.com/metio/helm-charts/tree/main/charts/joi):

```shell
helm upgrade --install joi oci://ghcr.io/metio/helm-charts/joi \
  --namespace default \
  --set libraries.grafonnet.enabled=true
```

That renders an `OCIRepository` plus a `JsonnetLibrary` named `grafonnet`,
sourcing `ghcr.io/metio/joi-grafana-grafonnet`. The snippet then references that
library in place of `grafana-helpers` and imports the real grafonnet API by its
full jb-vendor path:

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: api-latency
  namespace: default
spec:
  serviceAccountName: dashboard-renderer
  libraries:
    - kind: JsonnetLibrary
      name: grafonnet
  files:
    main.jsonnet: |
      local g = import 'github.com/grafana/grafonnet/gen/grafonnet-latest/main.libsonnet';
      g.dashboard.new('API Latency')
      + g.dashboard.withUid('api-latency')
```

Everything downstream is unchanged — it reconciles and publishes an
`ExternalArtifact` exactly as in Steps 4–5; only the source of the library
differs.

## Handoff: reconcile the dashboard into Grafana

The published `ExternalArtifact` is now ready for the grafana-operator to
consume. The grafana-operator reconciles a JaaS-published dashboard into Grafana
through a `GrafanaDashboard` CR that references the artifact. That configuration
— the `GrafanaDashboard` resource, the datasource and folder wiring, and the
`Grafana` instance — lives on the grafana-operator's own documentation:

- **grafana-operator JaaS example:**
  <https://grafana.github.io/grafana-operator/docs/examples/dashboard/jaas/readme/>
- **grafana-operator project:** <https://grafana.github.io/grafana-operator/>

Follow that example for the Grafana side; it picks up exactly where this
tutorial leaves off — at the published `ExternalArtifact`.

## Where to go next

- [Jsonnet libraries](/usage/jsonnet-libraries/) — serve the full grafonnet
  tree as a `JsonnetLibrary` backed by an OCIRepository, with the empty-`path`
  whole-vendor-tree pattern.
- [Snippet sources](/usage/snippet-sources/) — back the dashboard with a
  `GitRepository` or OCIRepository instead of inline `spec.files`, and point
  `spec.entryFile` at one dashboard in a multi-dashboard tree.


---

# Local rendering

Source: https://jaas.projects.metio.wtf/tutorials/local-rendering/


JaaS runs as a cluster-free Jsonnet renderer: point it at a directory of
snippets and a directory of libraries, then `GET` a snippet name to receive the
evaluated JSON. No Kubernetes, no operator mode, no Flux. The evaluation core is
the same one the operator uses, so a snippet that renders correctly here renders
identically in-cluster.

This tutorial runs against this repository's `examples/` layout, so clone the
repo first:

```shell
git clone https://github.com/metio/jaas
cd jaas
```

## Step 1 — Get the binary or container image

Pre-built binaries are attached to each
[GitHub release](https://github.com/metio/jaas/releases). Download the archive
for your platform, unpack it, and the `jaas` binary is inside.

A container image is published at `ghcr.io/metio/jaas:latest`:

```shell
docker pull ghcr.io/metio/jaas:latest
```

The examples below use a `jaas` binary on your `PATH`. To run the container
instead, mount `examples/` and map the port — for example
`docker run --rm -p 8080:8080 -v "$PWD/examples:/examples" ghcr.io/metio/jaas:latest`
with the flags adjusted to the in-container `/examples` paths, and
`--listen-address 0.0.0.0` so the port is reachable from the host.

## Step 2 — Run JaaS over the examples directory

Start JaaS with one snippet directory and one library path:

```shell
jaas \
  --snippet-directory examples/snippets/dashboards \
  --library-path examples/libraries
```

`--snippet-directory` exposes each subdirectory as a snippet whose name is the
directory name and whose entry file is `main.jsonnet`. `--library-path` makes the
libraries under `examples/libraries` importable by alias. Both flags repeat, so
you can pass several of each. The Jsonnet server binds `127.0.0.1:8080` by
default.

Confirm it started by hitting the readiness probe on the management server:

```shell
curl -i http://127.0.0.1:8081/ready
# HTTP/1.1 200 OK
```

## Step 3 — Render a snippet

`examples/snippets/dashboards/inheritance` is a self-contained snippet. Request
it by directory name:

```shell
curl http://127.0.0.1:8080/jsonnet/inheritance
```

JaaS returns the evaluated Jsonnet as JSON with `Content-Type:
application/json`. The `library-precedence` snippet imports the `examplonet`
library you exposed with `--library-path`:

```shell
curl http://127.0.0.1:8080/jsonnet/library-precedence
```

A snippet name that resolves to no file returns a `404` with a JSON error body;
a Jsonnet error returns a `400` carrying the go-jsonnet diagnostic.

## Step 4 — Pass a top-level argument

Top-level arguments arrive as URL query parameters. The `multi-tla` snippet is
`function(tags=["default"])` and joins its `tags` argument. Repeating a query
key passes a list, which becomes a JSON array TLA:

```shell
curl 'http://127.0.0.1:8080/jsonnet/multi-tla?tags=prod&tags=eu-west'
# {
#    "count": 2,
#    "joined": "prod, eu-west",
#    "list": [ "prod", "eu-west" ]
# }
```

A single occurrence of a query key (`?tags=prod`) passes a string instead of a
one-element array.

## Step 5 — Set an external variable

External variables are supplied through environment variables prefixed
`JAAS_EXT_VAR_`. The variable after the prefix is the `std.extVar` key. Restart
JaaS with the variables the `example1` snippet reads (`name` and `key`):

```shell
JAAS_EXT_VAR_name=Alice \
JAAS_EXT_VAR_key=secret-value \
jaas \
  --snippet-directory examples/snippets/dashboards \
  --library-path examples/libraries
```

Then render the snippet:

```shell
curl http://127.0.0.1:8080/jsonnet/example1
# {
#    ...
#    "person1": {
#       "external": "secret-value",
#       "name": "Alice",
#       "welcome": "Hello Alice!"
#    },
#    ...
# }
```

`std.extVar('name')` and `std.extVar('key')` resolve to the values from the
environment. External variables are read once at startup, not per request.

## Same core as the operator

The `jaas` binary evaluates Jsonnet through the same evaluation core whether it
serves HTTP locally or reconciles a `JsonnetSnippet` in operator mode. Local
rendering is the fast feedback loop for snippet authoring: a snippet that renders
here — with the same libraries available — renders identically when the operator
publishes it as an `ExternalArtifact`.

## Where to go next

- [Rendering endpoint](/usage/rendering-endpoint/) — the request shape, snippet
  resolution, the management probes, and the stable error contract.
- [Snippets and libraries](/usage/snippets-and-libraries/) — declaring snippets
  with `--snippet` and `--snippet-directory`, and libraries with `--library-path`.
- [External variables and TLAs](/usage/external-variables-and-tlas/) — the full
  `JAAS_EXT_VAR_*` and query-parameter rules.


---

# Quickstart

Source: https://jaas.projects.metio.wtf/tutorials/quickstart/


This tutorial takes you from an empty cluster to one published Flux
`ExternalArtifact` carrying rendered JSON. The path is operator mode with no
optional knobs — no webhook, no S3, no Flux source CRs — and a single
`JsonnetSnippet` whose source is inline `spec.files`.

## Prerequisites

- A Kubernetes cluster. `kind`, `minikube`, or a managed cluster all work.
- `kubectl` configured to talk to it.
- `helm` 3.x.
- Flux installed, at **v2.7.0 or newer**. A `JsonnetSnippet` publishes its
  result as a Flux `ExternalArtifact`, and the `ExternalArtifact` CRD lands in
  source-controller v1.7.0 (Flux v2.7.0) — earlier bundles have no such CRD and
  the publish path fails. Install all of Flux:

  ```shell
  kubectl apply -f https://github.com/fluxcd/flux2/releases/download/v2.7.0/install.yaml
  ```

## Step 1 — Install the chart

```shell
helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \
  --namespace jaas-system --create-namespace \
  --set operator.enabled=true \
  --set operator.defaultServiceAccount=default \
  --wait --timeout 5m
```

`operator.defaultServiceAccount=default` tells the operator which
ServiceAccount to impersonate in a tenant namespace when a snippet does not name
its own. That is fine for this tutorial; production assigns a dedicated SA per
tenant — see [Tenancy and RBAC](/usage/tenancy-and-rbac/).

Verify the operator is running:

```shell
kubectl --namespace jaas-system get deploy jaas
# NAME   READY   UP-TO-DATE   AVAILABLE   AGE
# jaas   1/1     1            1           30s
```

## Step 2 — Grant the tenant ServiceAccount the minimum verbs

The `default` ServiceAccount's built-in RBAC does not include the verbs the
operator needs to publish the artifact. In the tenant namespace — here `default`
— apply a `Role` and `RoleBinding`:

```shell
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: jaas-tenant
rules:
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [externalartifacts]
    verbs: [get, create, update, patch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: default
  name: jaas-tenant
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: jaas-tenant
EOF
```

The operator impersonates the tenant ServiceAccount to upsert the
`ExternalArtifact` it publishes. Without `create`, `update`, and `patch` on
`externalartifacts`, the first reconcile fails with `Reason=RBACDenied`.

Verify the binding:

```shell
kubectl --namespace default get rolebinding jaas-tenant
# NAME          ROLE               AGE
# jaas-tenant   Role/jaas-tenant   5s
```

## Step 3 — Apply your first snippet

```shell
cat <<EOF | kubectl apply -f -
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: hello
  namespace: default
spec:
  serviceAccountName: default
  files:
    main.jsonnet: |
      {
        greeting: "hello from jaas",
        rendered_at: std.extVar("now"),
      }
  externalVariables:
    now: "quickstart"
EOF
```

This is a complete `JsonnetSnippet`. Three fields carry it:

- `spec.serviceAccountName` — the ServiceAccount the operator impersonates for
  this snippet's API calls.
- `spec.files.<filename>` — inline Jsonnet source. The default entry file is
  `main.jsonnet`; override it with `spec.entryFile`.
- `spec.externalVariables` — `std.extVar()` lookups available to the snippet.

Verify the resource exists:

```shell
kubectl --namespace default get jsonnetsnippet hello
```

## Step 4 — Confirm it reconciled

```shell
kubectl --namespace default get jsonnetsnippet hello
# NAME    READY   URL                                                                                    AGE
# hello   True    http://jaas-storage.jaas-system.svc.cluster.local:8082/default/hello/<sha256>.tar.gz   5s
```

The `URL` column is the artifact's address. If `READY` is `False`, describe the
resource — the `Reason` and `Message` on the Ready condition name the problem
(most commonly an RBAC gap or a Jsonnet syntax error):

```shell
kubectl --namespace default describe jsonnetsnippet hello
```

The `ExternalArtifact` is the resource downstream Flux consumers read:

```shell
kubectl --namespace default get externalartifact hello -o yaml
# status:
#   artifact:
#     url: http://jaas-storage.jaas-system.svc.cluster.local:8082/default/hello/<sha256>.tar.gz
#     digest: sha256:<hex>
#     revision: sha256:<hex>
#     size: <bytes>
```

## Step 5 — Fetch the rendered bytes

The URL resolves in-cluster only. Fetch the tarball from a one-shot pod:

```shell
URL=$(kubectl --namespace default get jsonnetsnippet hello -o jsonpath='{.status.artifactURL}')
kubectl run --rm -i --restart=Never --image=docker.io/curlimages/curl:8.10.1 fetch -- \
    sh -c "curl -fsSL '$URL' | tar -xzO rendered.json"
# {
#    "greeting": "hello from jaas",
#    "rendered_at": "quickstart"
# }
```

The tarball carries a single `rendered.json` because `spec.output` defaults to
`rendered` (the evaluated JSON). Setting `spec.output: source` publishes the raw
`.jsonnet` files instead, for consumers that re-evaluate themselves.

## Clean up

```shell
kubectl --namespace default delete jsonnetsnippet hello
kubectl --namespace default delete rolebinding jaas-tenant
kubectl --namespace default delete role jaas-tenant
helm --namespace jaas-system uninstall jaas
kubectl delete namespace jaas-system
```

The chart's pre-delete hook waits for the snippet's finalizer to drop — which
removes the `ExternalArtifact` — before the operator pod is removed, so the
uninstall leaves no orphans.

## Where to go next

- [Grafana dashboards](/tutorials/grafana-dashboards/) — render grafonnet
  dashboards and hand them to the grafana-operator.
- [Deploying manifests with StageSet](/tutorials/deploying-manifests/) — render
  Kubernetes manifests and roll them out with stageset-controller.
- [Operator mode](/usage/operator-mode/) — the full operator reference: source
  kinds, leader election, the artifact contract.
- [Usage](/usage/) — every configuration knob, one page per concern.


---

# Usage

Source: https://jaas.projects.metio.wtf/usage/


One page per feature. JaaS has two faces over one evaluation core: the **HTTP
renderer** and the **Flux operator** (`--enable-flux-integration`). The first
pages below cover the renderer; the rest cover the operator. The
[API reference](/api/jsonnetsnippet/) carries the exhaustive field-by-field
detail.


---

# Admission webhook

Source: https://jaas.projects.metio.wtf/usage/admission-webhook/


JaaS ships an optional validating admission webhook for `JsonnetSnippet`. It is
independent of, but layered on top of, [operator mode](/usage/operator-mode/).

## Enabling it

Set `--enable-webhook` to boot the webhook server. It requires
`--enable-flux-integration` (the webhook is wired only inside the operator boot
path; enabling it alone is rejected as a flag error) and TLS material — `tls.crt`
and `tls.key` — under `--webhook-cert-dir` (default
`/tmp/k8s-webhook-server/serving-certs`). The server binds `--webhook-port`
(default `9443`).

## What it validates

The webhook rejects a `JsonnetSnippet` whose `spec.externalVariables` declares a
key that collides with an operator-level `--ext-var`. An operator-supplied
external variable always wins, so a snippet that tries to redeclare one would
render against a value it does not control; the webhook refuses the snippet at
admission time instead.

The reconciler enforces the same invariant as a fallback, so a snippet that
bypasses admission — for example when the webhook is unreachable under a
`failurePolicy: Ignore` — still fails at reconcile rather than rendering with the
wrong value.

## Failure policy trade-off

The Helm chart defaults `operator.webhook.failurePolicy: Fail`. With `Fail`, a
webhook outage blocks every `JsonnetSnippet` create and update cluster-wide until
the operator is back. During a rolling update that window is typically under five
seconds, because leader election releases the lease on context-cancel and the next
replica takes over immediately.

If your CI or GitOps tooling cannot tolerate even that window, narrow or relax the
webhook:

- `operator.webhook.objectSelector` — match only snippets carrying a label, e.g.
  require `jaas.metio.wtf/managed: "true"`.
- `operator.webhook.namespaceSelector` — opt in per namespace.
- `failurePolicy: Ignore` — let create/update through when the webhook is
  unreachable, relying on the reconciler-side fallback to catch the colliding-key
  case.

## TLS provisioning

`--webhook-cert-mode` selects how the serving certificate is provisioned.

### cert-manager (default)

`--webhook-cert-mode=cert-manager` expects external tooling to provision
`tls.crt`/`tls.key`. The Helm chart renders a `cert-manager.io/v1` Certificate and
mounts the issued Secret into the pod at `--webhook-cert-dir`. cert-manager
handles renewal; the webhook server hot-reloads TLS when the mounted files change.

### self-signed

`--webhook-cert-mode=self-signed` makes the operator generate a CA and serving
certificate in-pod, write them to `--webhook-cert-dir`, and patch the named
`ValidatingWebhookConfiguration`'s `caBundle` so the apiserver trusts the chain. A
background renewer regenerates and re-writes the files before expiry, and the
webhook server hot-reloads without a restart. The relevant flags:

| Flag | Purpose |
|---|---|
| `--webhook-validating-config-name` | Name of the `ValidatingWebhookConfiguration` whose `caBundle` is patched. Required in this mode. |
| `--webhook-service-name` | Service name the webhook is reachable through; used to build the certificate SANs (default `jaas-webhook`). |
| `--webhook-service-namespace` | Namespace the webhook Service lives in. Empty falls back to `--leader-election-namespace`, then to the in-cluster downward API. |
| `--webhook-cert-validity` | Validity of the self-signed serving certificate (default `8760h`, one year). |
| `--webhook-port` | Port the webhook server binds to (default `9443`). |

In self-signed mode the operator needs `get`/`update` on the named
`ValidatingWebhookConfiguration`. Multiple replicas bootstrapping at once during a
rolling update converge on a combined `caBundle` rather than clobbering each
other, so each replica's CA stays trusted across the rollout.

The full flag list with defaults is on the
[configuration page](/installation/configuration/).


---

# Alerting

Source: https://jaas.projects.metio.wtf/usage/alerting/


JaaS turns a sustained problem into a notification two ways: a Prometheus
`PrometheusRule` that pages on its [metrics](/usage/metrics/), and Kubernetes
Events that Flux's notification-controller can route to chat or e-mail.

## The binary

The operator emits a standard Kubernetes `Event` on every Ready-condition
transition — `Normal` for `Synced`, `Warning` for every other reason. The reason
string fills both the event `reason` and `action`. These Events need no flag to
enable; they are written whenever the operator reconciles.

The operator also threads runbook links into its own status automatically: every
actionable Ready-condition Message gains a
`(runbook: https://jaas.projects.metio.wtf/runbooks/<reason>/)` suffix, so
`kubectl describe jsonnetsnippet` points straight at the matching page. Healthy
or intentional states (`Synced`, `Suspended`, `Pending`) get no suffix.

### Routing Events through Flux

Routing the Events is Flux's `notification-controller`: target an `Alert` CR at
`kind: JsonnetSnippet` and JaaS needs no `Provider`/`Alert` plumbing of its own.

```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: jaas-snippets
  namespace: flux-system
spec:
  providerRef:
    name: slack
  eventSeverity: warn   # 'info' to include success events
  eventSources:
    - kind: JsonnetSnippet
      name: '*'
```

Wire whatever `Provider` you already use for Flux source CRs; see the
[Flux notification-controller documentation](https://fluxcd.io/) for provider
configuration.

### The alert catalog

The chart ships a starter alert set on the custom metrics plus a handful of
controller-runtime signals. Each alert carries its remediation page as a
`runbook_url` annotation so Alertmanager renders a direct link:

| Alert | Severity | Fires when | Threshold knobs (default) | Runbook |
|---|---|---|---|---|
| `JaaSSnippetReconcileErrorsHigh` | warning | A snippet keeps flipping to Ready=False (excluding `Synced`/`Suspended`/`Pending`). | `reconcileErrorRate` (0.1/s), `reconcileErrorDuration` (10m) | per-reason page under [`/runbooks/`](/runbooks/) |
| `JaaSSnippetArtifactGrowing` | warning | p99 `jaas_snippet_rendered_bytes` exceeds the size ceiling. | `artifactSizeBytes` (16 MiB), `artifactSizeDuration` (30m) | [artifacttoolarge](/runbooks/artifacttoolarge/) |
| `JaaSControllerWorkqueueDepthHigh` | warning | The `jsonnetsnippet` workqueue can't drain. | `workqueueDepth` (50), `workqueueDuration` (15m) | [workqueue-saturation](/runbooks/workqueue-saturation/) |
| `JaaSReconcileLatencyHigh` | warning | p99 reconcile time crosses the ceiling. | `reconcileLatencySeconds` (30), `reconcileLatencyDuration` (15m) | [reconcile-latency](/runbooks/reconcile-latency/) |
| `JaaSOperatorPodDown` | critical | A jaas pod stays NotReady. | `podDownDuration` (5m) | [operator-pod-down](/runbooks/operator-pod-down/) |
| `JaaSStorageSweepFailures` | warning | Background sweeps fail per hour above the floor. | `sweepFailuresPerHour` (3), `sweepFailuresDuration` (30m) | [storage-recovery](/runbooks/storage-recovery/) |
| `JaaSWebhookCertRenewalFailing` | critical | Self-signed cert renewal fails per hour above the floor. | `webhookCertRenewalFailuresPerHour` (1), `webhookCertRenewalFailuresDuration` (30m) | [webhook-cert-renewal](/runbooks/webhook-cert-renewal/) |
| `JaaSTenantTokenMintFailing` | warning | Token mints fail for a `(namespace, serviceAccount)` pair. | `tenantTokenMintFailureRate` (0.01/s), `tenantTokenMintFailureDuration` (10m) | [rbacdenied](/runbooks/rbacdenied/) |
| `JaaSForceDropsAccumulating` | warning | Snippet finalizers are force-dropped per hour above the floor. | `forceDropsPerHour` (0), `forceDropsDuration` (5m) | [storage-recovery](/runbooks/storage-recovery/) |
| `JaaSCRDWatchEngagementFailing` | warning | A Flux source watch won't engage for a GVK. | `crdWatchEngagementFailuresPerHour` (1), `crdWatchEngagementFailuresDuration` (30m) | [crd-watch-engagement](/runbooks/crd-watch-engagement/) |
| `JaaSEvalSaturation` | warning | In-flight evals exceed the saturation ratio of the cap (guarded on the cap being non-zero). | `evalSaturationRatio` (0.9), `evalSaturationDuration` (10m) | [eval-saturation](/runbooks/eval-saturation/) |
| `JaaSEvalRejected` | warning | The semaphore turns evals away per second above the floor. | `evalRejectedRate` (0.05/s), `evalRejectedDuration` (10m) | [eval-saturation](/runbooks/eval-saturation/) |
| `JaaSEvalLeakedGoroutines` | warning | Orphan eval goroutines persist above the floor — a runaway snippet. | `evalLeakedFloor` (0), `evalLeakedDuration` (5m) | [eval-saturation](/runbooks/eval-saturation/) |

`JaaSSnippetReconcileErrorsHigh` templates its runbook URL on the failing reason,
so it lands on the matching per-reason page under [`/runbooks/`](/runbooks/). Each
Ready-condition reason and each alert maps to a remediation page there.

## The Helm chart

The `PrometheusRule` is opt-in under `operator.metrics.prometheusRule` and needs
the Prometheus Operator's `monitoring.coreos.com/v1` API in the cluster:

```yaml
operator:
  enabled: true
  metrics:
    enabled: true
    prometheusRule:
      enabled: true
      interval: 30s
      # Labels your Prometheus instance selects PrometheusRules on.
      labels:
        release: kube-prometheus
      # Merged onto every rendered alert — route all jaas alerts
      # through one Alertmanager receiver.
      extraAlertLabels:
        team: platform
      # Annotation key the runbook URL lands under (Prometheus-operator
      # convention is runbook_url).
      runbookAnnotationKey: runbook_url
```

Every threshold is a knob under `operator.metrics.prometheusRule.thresholds`, so
the noise floor is tunable without copy-pasting rule bodies. To silence a
built-in alert, raise its threshold to an impossibly high value — there is no
per-alert disable toggle, and the threshold pattern keeps "this alert is
intentionally inert" visible in the chart values. Cluster-specific rules append
under a separate group via `operator.metrics.prometheusRule.extraRules`.


---

# Creating source artifacts

Source: https://jaas.projects.metio.wtf/usage/creating-sources/


A `JsonnetSnippet`'s `spec.sourceRef` consumes a Flux source. The operator reads
the referenced source CR's `status.artifact.url`, downloads the tarball Flux's
source-controller serves there, verifies its `status.artifact.digest`, and
extracts it into the snippet's file tree. Every supported kind —
`GitRepository`, `OCIRepository`, and `Bucket` — reaches the operator through
that same `status.artifact` contract, so the operator never talks to a git
remote, an OCI registry, or an object store directly. Flux owns that fetch; the
operator consumes the artifact Flux already produced.

The recipes below show how to produce each source kind so a snippet can reference
it. [Snippet sources](/usage/snippet-sources/) covers wiring the finished source
into a `JsonnetSnippet`. For the source CRDs themselves and their full field
reference, see the [Flux documentation](https://fluxcd.io/) — only what a JaaS
source needs is covered here.

A `JsonnetSnippet` references the source you create with a `spec.sourceRef`:

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: dashboards
  namespace: default
spec:
  serviceAccountName: dashboards-tenant
  entryFile: dashboards/api-latency.jsonnet
  sourceRef:
    kind: GitRepository      # or OCIRepository, or Bucket
    name: dashboards-source
    path: dashboards/        # optional: narrow extraction to a subtree
```

The tenant ServiceAccount needs `get` on the referenced source kind. See
[Tenancy and RBAC](/usage/tenancy-and-rbac/) for the exact verbs.

## GitRepository

A `GitRepository` source tracks a branch, tag, or commit of a git repository.
source-controller clones the ref and packs the tree into the tarball the
operator fetches. There is no packaging or layer constraint — the operator
extracts whatever files the commit contains.

1. Lay out your Jsonnet files in a directory. File names and the directory
   structure carry over verbatim into the snippet's file tree, so place the
   entry file where `spec.entryFile` expects it:

   ```text
   dashboards/
   ├── api-latency.jsonnet
   ├── error-budget.jsonnet
   └── lib/
       └── panels.libsonnet
   ```

2. Commit the files and push them to a git repository:

   ```shell
   git add dashboards/
   git commit -m "Add Grafana dashboards"
   git push origin main
   ```

3. Create the `GitRepository` source. With the Flux CLI:

   ```shell
   flux create source git dashboards-source \
     --url=https://github.com/example-org/grafana-dashboards \
     --branch=main \
     --interval=5m \
     --namespace=default \
     --export
   ```

   The equivalent CR YAML, which is authoritative:

   ```yaml
   apiVersion: source.toolkit.fluxcd.io/v1
   kind: GitRepository
   metadata:
     name: dashboards-source
     namespace: default
   spec:
     interval: 5m
     url: https://github.com/example-org/grafana-dashboards
     ref:
       branch: main
   ```

4. Point a snippet's `spec.sourceRef` at the source. Set `kind: GitRepository`,
   `name: dashboards-source`, and optionally `path:` to extract only a subtree:

   ```yaml
   apiVersion: jaas.metio.wtf/v1
   kind: JsonnetSnippet
   metadata:
     name: api-latency-dashboard
     namespace: default
   spec:
     serviceAccountName: dashboards-tenant
     entryFile: dashboards/api-latency.jsonnet
     sourceRef:
       kind: GitRepository
       name: dashboards-source
       path: dashboards/
   ```

When a new commit lands on the tracked branch, source-controller republishes the
artifact and the operator's watch re-renders the snippet.

## OCIRepository

An `OCIRepository` source pulls an OCI artifact from a registry. source-controller
unpacks the artifact's single gzipped-tar layer into the tarball the operator
fetches. Producing the artifact with `flux push artifact` packs a directory into
exactly that shape.

1. Lay out your Jsonnet files in a directory, the same as for a git source:

   ```text
   ./
   ├── main.jsonnet
   └── lib/
       └── panels.libsonnet
   ```

2. Push the directory as an OCI artifact with the Flux CLI. `flux push artifact`
   packs the directory into one gzipped-tar layer and pushes it to the registry:

   ```shell
   flux push artifact oci://ghcr.io/example-org/dashboards:v1 \
     --path=. \
     --source="$(git config --get remote.origin.url)" \
     --revision="$(git rev-parse HEAD)"
   ```

   `--source` and `--revision` stamp provenance metadata onto the artifact;
   set them to a URL and a version identifier of your choosing.

3. Create the `OCIRepository` source. With the Flux CLI:

   ```shell
   flux create source oci dashboards-source \
     --url=oci://ghcr.io/example-org/dashboards \
     --tag=v1 \
     --interval=5m \
     --namespace=default \
     --export
   ```

   The equivalent CR YAML, which is authoritative:

   ```yaml
   apiVersion: source.toolkit.fluxcd.io/v1
   kind: OCIRepository
   metadata:
     name: dashboards-source
     namespace: default
   spec:
     interval: 5m
     url: oci://ghcr.io/example-org/dashboards
     ref:
       tag: v1
   ```

4. Point a snippet's `spec.sourceRef` at the source with `kind: OCIRepository`:

   ```yaml
   apiVersion: jaas.metio.wtf/v1
   kind: JsonnetSnippet
   metadata:
     name: api-latency-dashboard
     namespace: default
   spec:
     serviceAccountName: dashboards-tenant
     entryFile: main.jsonnet
     sourceRef:
       kind: OCIRepository
       name: dashboards-source
   ```

> **Single layer is mandatory.** `flux push artifact` produces an OCI artifact
> with exactly one gzipped-tar layer, which is what source-controller expects
> and the only shape it unpacks. An artifact built any other way — a hand-rolled
> `oras push` with one file per layer, a `Dockerfile`/container-image build, or
> any tool that splits content across multiple layers — is not consumed
> correctly. source-controller cannot reconstruct the file tree, the snippet's
> source never resolves, and the snippet reports `Ready=False`. Always build OCI
> sources with `flux push artifact`.

Verify the layer count before relying on an artifact. Fetch the manifest and
confirm the `layers` array has length 1:

```shell
oras manifest fetch oci://ghcr.io/example-org/dashboards:v1 | \
  jq '.layers | length'
```

A result of `1` is required. Any other number means the artifact was not built
with `flux push artifact` and will not resolve.

### Private registries and Amazon ECR

source-controller performs the pull, so registry credentials belong on the
`OCIRepository` (or on source-controller itself) — never on the JaaS operator or a
snippet's ServiceAccount. The same applies to a `JsonnetLibrary` whose `sourceRef`
points at an `OCIRepository`.

For a generic private registry, add a `spec.secretRef` to a `docker-registry`
Secret. For **Amazon ECR you need no pull Secret at all**: set `spec.provider: aws`
and source-controller authenticates with its own ambient AWS identity. On EKS that
is an IRSA role bound to **source-controller's** ServiceAccount with ECR read
permissions:

```yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: OCIRepository
metadata:
  name: dashboards-source
  namespace: default
spec:
  interval: 5m
  provider: aws
  url: oci://111122223333.dkr.ecr.eu-west-1.amazonaws.com/dashboards
  ref:
    tag: v1
```

The IRSA role needs `ecr:GetAuthorizationToken` (resource `*`) plus
`ecr:BatchGetImage` and `ecr:GetDownloadUrlForLayer` on the repository. Because the
credential is source-controller's, one role covers every `OCIRepository` it pulls,
and the JaaS operator stays out of the registry path entirely.

This IRSA role is source-controller's, not the JaaS operator's. The JaaS operator
uses IRSA only for its own [S3 storage backend](/usage/storage-and-ha/) — a
separate concern from pulling sources.

There is a third way to load OCI content that does not go through a `sourceRef` at
all: the chart can mount snippets and libraries from **OCI image volumes**
(`snippets` / `additionalLibraries`), read straight from a registry into the pod.
Those volumes are pulled by the **kubelet**, exactly like a container image — so
they authenticate the way images do, not through IRSA. On EKS that means the
**node's** IAM role with ECR read (the `AmazonEC2ContainerRegistryReadOnly`
managed policy the default node role already carries), or an `imagePullSecret` on
the pod. Pod-level IRSA grants the *pod's* ServiceAccount AWS API access, which the
kubelet does not use when pulling images, so it is not the mechanism for this path.
With a node role that can read ECR, image-volume snippets and libraries load with
no pull Secret. Static OCI mounts and operator mode are mutually exclusive in one
release, so a given install uses either these mounts or the `sourceRef` path above,
not both.

The [jsonnet-oci-images (JOI)](https://github.com/metio/jsonnet-oci-images)
project enforces this same single-layer rule for every image it publishes, so
its images are ready-made single-layer `OCIRepository` sources. Reference a JOI
image directly when you need a shared Jsonnet library tree (grafonnet, the
jsonnet-libs catalog) rather than building and maintaining your own OCI source.

## Bucket

A `Bucket` source mirrors objects from an S3- or GCS-compatible bucket.
source-controller fetches the matching objects, packs them into the tarball the
operator fetches, and there is no layer constraint — the only requirement is
that the objects laid out under the bucket prefix form the file tree your snippet
expects.

1. Produce the files to upload. Either upload the individual `.jsonnet` /
   `.libsonnet` files under a prefix, or pack them into a single archive — both
   work, source-controller flattens the mirrored objects into the file tree:

   ```text
   dashboards/
   ├── main.jsonnet
   └── lib/
       └── panels.libsonnet
   ```

2. Upload the files to the bucket under a prefix. With the AWS CLI against an
   S3-compatible endpoint:

   ```shell
   aws s3 cp dashboards/ s3://example-bucket/dashboards/ \
     --recursive \
     --endpoint-url=https://s3.example.com
   ```

3. Create the `Bucket` source. With the Flux CLI:

   ```shell
   flux create source bucket dashboards-source \
     --bucket-name=example-bucket \
     --endpoint=s3.example.com \
     --provider=generic \
     --secret-ref=bucket-credentials \
     --interval=5m \
     --namespace=default \
     --export
   ```

   The equivalent CR YAML, which is authoritative:

   ```yaml
   apiVersion: source.toolkit.fluxcd.io/v1
   kind: Bucket
   metadata:
     name: dashboards-source
     namespace: default
   spec:
     interval: 5m
     provider: generic
     bucketName: example-bucket
     endpoint: s3.example.com
     secretRef:
       name: bucket-credentials
   ```

   The referenced Secret carries the bucket credentials (`accesskey` /
   `secretkey`). See the [Flux documentation](https://fluxcd.io/) for the Secret
   layout and provider-specific fields.

4. Point a snippet's `spec.sourceRef` at the source with `kind: Bucket`. Use
   `path:` to extract only the prefix that holds your Jsonnet:

   ```yaml
   apiVersion: jaas.metio.wtf/v1
   kind: JsonnetSnippet
   metadata:
     name: api-latency-dashboard
     namespace: default
   spec:
     serviceAccountName: dashboards-tenant
     entryFile: main.jsonnet
     sourceRef:
       kind: Bucket
       name: dashboards-source
       path: dashboards/
   ```

## Which source should I use?

| Source          | Use when                                                                                              |
|-----------------|-------------------------------------------------------------------------------------------------------|
| `GitRepository` | Your Jsonnet is human-authored configuration living in a version-controlled git repository.           |
| `OCIRepository` | You want an immutable, content-addressed artifact; must be a single layer, and pairs with JOI images. |
| `Bucket`        | Your artifacts already live in S3- or GCS-compatible object storage.                                  |


---

# Evaluation and security

Source: https://jaas.projects.metio.wtf/usage/evaluation-and-security/


JaaS runs Jsonnet on the server and returns the result over HTTP. Three caps
bound each evaluation, and a small security model governs what a snippet and its
callers can reach. Review and tune both sections before exposing the service to
a wider audience.

## Evaluation caps

| Flag                    | Default                  | Effect                                                                       |
|-------------------------|--------------------------|------------------------------------------------------------------------------|
| `--evaluation-timeout`   | `5s`                     | Wall-clock budget per evaluation. Exceeding it returns `504 evaluation_timeout`. `0` disables the timeout. |
| `--max-stack`            | `500`                    | Maximum Jsonnet call-stack depth. `0` uses go-jsonnet's own default.        |
| `--max-concurrent-evals` | `max(GOMAXPROCS*4, 16)`  | In-flight evaluations allowed at once. Excess requests return `503 evaluation_unavailable`. `0` disables the cap. |

```shell
./jaas \
  --snippet-directory examples/snippets/dashboards \
  --evaluation-timeout 2s \
  --max-stack 1000 \
  --max-concurrent-evals 32
```

The default for `--max-concurrent-evals` bounds worst-case goroutine pile-up
under a runaway snippet. Each in-flight evaluation pins roughly one CPU for its
working set, so raising the cap far above the available parallelism queues work
without adding throughput.

## Security model

**Library paths are an unrestricted read scope.** Any file reachable under a
configured `--library-path`, or under a snippet's own directory, can be
`import`-ed or `importstr`-ed by any snippet — go-jsonnet's importer does not
sandbox per snippet. Scope these directories tightly. Never point them at `/`,
`/etc`, or anywhere holding credentials.

**Snippets are operator-controlled, not caller-controlled.** Callers supply only
top-level arguments through the query string. Jsonnet's `import` and `importstr`
require string-literal paths, so a TLA or external variable cannot construct an
import path. Deploying a snippet authored by someone you do not trust is
equivalent to running their code on the server.

**Snippet name resolution is sandboxed.** The URL's snippet segment resolves
through Go's `os.Root`, which rejects `..` traversal and symlinks that escape the
configured snippet directory. A URL like `/jsonnet/../etc/passwd` returns `404`,
even though the OS would otherwise resolve the path.

**Evaluation has caps but no mid-flight cancellation.** `--evaluation-timeout`
bounds wall-clock time and `--max-stack` bounds call-stack depth, but go-jsonnet
cannot abort an evaluation already running. A slow snippet keeps consuming CPU
until it finishes naturally or the timeout fires the HTTP response. Size
container CPU and memory limits to absorb that worst case.

The Prometheus metrics `jaas_eval_in_flight` (gauge: live in-flight count),
`jaas_eval_unavailable_total` (counter: cumulative cap rejections), and
`jaas_eval_outstanding_timed_out` (gauge: evals still running after their
request timed out) surface how close evaluation runs to these caps. See
[Observability](/usage/observability/) for detail.

The HTTP status codes these caps produce are documented in the
[rendering endpoint](/usage/rendering-endpoint/) error contract.


---

# External variables and TLAs

Source: https://jaas.projects.metio.wtf/usage/external-variables-and-tlas/


JaaS feeds two kinds of input into an evaluation: external variables, set by the
process owner at startup, and top-level arguments, supplied per request through
the URL query string.

## External variables

External variables come from two sources. The environment mechanism reads every
variable prefixed with `JAAS_EXT_VAR_` — the suffix is the variable name:

```shell
JAAS_EXT_VAR_name=Alice \
JAAS_EXT_VAR_key=secret \
  ./jaas --snippet-directory examples/snippets/dashboards
```

The `--ext-var KEY=VALUE` flag does the same and is repeatable. On a key
conflict, the flag takes precedence over the environment value:

```shell
./jaas \
  --snippet-directory examples/snippets/dashboards \
  --ext-var name=Alice \
  --ext-var key=secret
```

A snippet reads a variable with `std.extVar`:

```jsonnet
{
  person1: {
    name: std.extVar('name'),
    external: std.extVar('key'),
  },
}
```

Fetching `example1` with those variables set produces:

```shell
curl http://127.0.0.1:8080/jsonnet/example1
```

```json
{
  "person1": {
    "external": "secret",
    "name": "Alice",
    "welcome": "Hello Alice!"
  }
}
```

External variables are fixed at startup. Callers cannot set them per request —
that is what top-level arguments are for.

## Top-level arguments

A snippet that evaluates to a function receives top-level arguments (TLAs) from
the URL query string. The `tla-example` snippet is such a function:

```jsonnet
function(something="value", other="more", required)
  {
    person1: {
      welcome: 'Hello ' + something + '!',
      key: other,
      required: std.parseJson(required),
    },
  }
```

Each query parameter sets a TLA. A single value becomes a string:

```shell
curl 'http://127.0.0.1:8080/jsonnet/tla-example?something=Ada&required=42'
# {"person1":{"key":"more","required":42,"welcome":"Hello Ada!"},...}
```

A repeated parameter becomes a list. The `multi-tla` snippet joins whatever it
receives:

```jsonnet
function(tags=["default"])
  {
    count: std.length(tags),
    list: tags,
    joined: std.join(", ", tags),
  }
```

```shell
curl 'http://127.0.0.1:8080/jsonnet/multi-tla?tags=blue&tags=green'
# {"count":2,"joined":"blue, green","list":["blue","green"]}
```

A bare parameter with no value sets the TLA to an empty string:

```shell
curl 'http://127.0.0.1:8080/jsonnet/tla-example?something&required=0'
```

For the request and response shape these examples ride on, see the
[rendering endpoint](/usage/rendering-endpoint/).


---

# JOI images

Source: https://jaas.projects.metio.wtf/usage/joi-images/


[Jsonnet OCI Images](https://github.com/metio/jsonnet-oci-images) (JOI) package
popular Jsonnet libraries as single-layer OCI images, one per upstream library,
published at `ghcr.io/metio/joi-<org>-<repo>`. Because each image is a single
layer, the same artifact serves two roles: a container **image volume** mounted
into jaas, and a Flux **`OCIRepository`** source the operator fetches — so a
snippet imports a vendored library without bundling it.

Deploy them with the [joi Helm chart](/installation/helm-values/#joi-library-chart),
which renders a `JsonnetLibrary` + `OCIRepository` pair for each enabled library.
A snippet then imports a library by its alias, choosing the version in the import
path:

```jsonnet
import 'github.com/jsonnet-libs/k8s-libsonnet/1.34/main.libsonnet'
```

The catalog below is generated from the
[jsonnet-oci-images manifest](https://github.com/metio/jsonnet-oci-images/blob/main/libraries.json),
so it always reflects the currently published set. Pin an image with the moving
`:latest` tag or an immutable dated `:<YYYY.M.D>` snapshot.

{{< joi-images >}}


---

# Jsonnet libraries

Source: https://jaas.projects.metio.wtf/usage/jsonnet-libraries/


Snippets import reusable Jsonnet from two places: namespaced `JsonnetLibrary`
custom resources and OCI-mounted shared libraries the operator carries on disk.
Both feed the same import-alias namespace, so a snippet's `import` statements
look identical regardless of where the library comes from.

## The JsonnetLibrary CRD

A `JsonnetLibrary` is a namespaced bundle of `.libsonnet` files. Like a snippet,
it declares exactly one source — inline `spec.files` or a `spec.sourceRef` to a
Flux source (`GitRepository`, `OCIRepository`, `Bucket`, `ExternalArtifact`).
The library carries no registration name of its own; the import alias is chosen
on the snippet side.

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetLibrary
metadata:
  name: grafana-helpers
  namespace: default
spec:
  files:
    dashboard.libsonnet: |
      {
        new(title): {
          title: title,
          panels: [],
          schemaVersion: 38,
        },
      }
    panel.libsonnet: |
      {
        graph(title): { type: 'graph', title: title },
        stat(title): { type: 'stat', title: title },
      }
```

A `JsonnetLibrary` whose `spec.sourceRef` points at an `OCIRepository` lets you
ship a jb-vendored library tree (grafonnet, docsonnet, and similar) as an OCI
artifact and import it from snippets without inlining every file.

## Referencing a library from a snippet

A snippet enumerates the libraries it can import in `spec.libraries[]`. Each
entry is a `LibraryRef`:

- `kind` — `JsonnetLibrary` (the only library kind).
- `name` — the `JsonnetLibrary` resource's name.
- `importPath` — the alias the snippet's `import` statements use. Defaults to the
  library's `name`.

A library not listed in `spec.libraries` is invisible to the snippet even when it
exists in the same namespace — the enumeration is the allowlist.

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: my-dashboard
  namespace: default
spec:
  serviceAccountName: grafana-tenant
  entryFile: main.jsonnet
  files:
    main.jsonnet: |
      local dashboard = import 'grafana/dashboard.libsonnet';
      local panel = import 'grafana/panel.libsonnet';
      dashboard.new('API Latency') + {
        panels: [
          panel.graph('p99 by route'),
          panel.stat('error rate'),
        ],
      }
  libraries:
    - kind: JsonnetLibrary
      name: grafana-helpers
      importPath: grafana
```

The snippet imports `grafana/dashboard.libsonnet` because the `LibraryRef` sets
`importPath: grafana`. Drop `importPath` and the alias defaults to the library's
name, `grafana-helpers`. The operator reads the `JsonnetLibrary` through the
tenant's impersonating client, so the snippet's ServiceAccount needs `get` on
`jsonnetlibraries.jaas.metio.wtf` — see
[Tenancy and RBAC](/usage/tenancy-and-rbac/).

## OCI-mounted shared libraries

Cluster-wide shared libraries are mounted into the operator pod's filesystem
rather than expressed as CRs. The operator scans every `--library-path`
directory at startup, reads every `.libsonnet` / `.jsonnet` / `.json` file into
memory, and folds those entries into every snippet's import namespace
additively — after the snippet's own `LibraryRef` resolution. A snippet imports
an OCI-mounted library by alias with no `LibraryRef` at all:

```jsonnet
local grafonnet = import 'grafonnet/main.libsonnet';
```

With the Helm chart this is the `additionalLibraries` value, which mounts each
configured OCI artifact under a `--library-path` directory. There is deliberately
no cluster-scoped library CRD: a snippet produces a namespaced
`ExternalArtifact`, so producers stay namespaced, while genuinely cluster-wide
shared libraries take the OCI-mount path. This is also the path the cluster-free
local renderer uses, so the same library tree renders identically on a
workstation and in the cluster.

### Library-alias safety

An OCI-mounted alias and a `LibraryRef` alias must not collide. When the operator
starts with `--library-path` flags it records every mounted alias, and both
admission and the reconciler reject any `LibraryRef` whose `importPath` (or, when
`importPath` is omitted, the library `name`) shadows one of those names. This
catches the case where grafonnet is mounted via OCI but a `JsonnetLibrary`
`LibraryRef` is also aliased `grafonnet` — the additive merge would otherwise
resolve the collision silently in favor of the CR. Rename the import alias or
remove the `LibraryRef` to resolve the rejection.

## JsonnetLibrary vs a source-output snippet

A `JsonnetLibrary` and a `JsonnetSnippet` rendered in `source` output mode both
hand Jsonnet to another snippet, so they can look interchangeable. They differ
on one axis — whether they publish an artifact:

| | `JsonnetLibrary` | `source`-output `JsonnetSnippet` |
|---|---|---|
| Role | Passive dependency | Active producer |
| Reached by | `import` by alias | `spec.sourceRef` to its `ExternalArtifact` |
| When it loads | In-process during a snippet's evaluation | Fetched as a tarball before evaluation |
| Scope | Same namespace as the importing snippet | Cross-namespace via the `ExternalArtifact` |
| Publishes an artifact | No | Yes — content-addressed and revisioned |

A `JsonnetLibrary` is a passive dependency: a snippet lists it in
`spec.libraries`, imports it by alias, and the operator folds its files into the
import namespace in-process while evaluating. It publishes no artifact and is
visible only within its own namespace.

A `source`-output `JsonnetSnippet` is an active producer: it publishes an
`ExternalArtifact` carrying its raw Jsonnet. That artifact is content-addressed,
revisioned, and consumable across namespaces, so a downstream snippet pins it
with `spec.sourceRef` and re-evaluates the Jsonnet itself.

Use a `JsonnetLibrary` for shared helpers your snippets import by alias. Use
`output: source` chaining when one snippet's Jsonnet should feed another as a
pinned Flux artifact — see
[Chaining snippets](/usage/snippet-sources/#chaining-snippets).

## Import resolution

The operator's in-memory importer resolves `import` and `importstr` statements
with the same semantics as `jsonnet -J vendor`. A jb-vendored library tree
renders identically on the operator path and locally — this parity is the reason
the same Jsonnet works on a workstation and in the cluster without change. For
an import path, resolution proceeds:

1. **Sibling-relative** — relative to the importing file within its own root, so
   a bare `import 'dashboard.libsonnet'`, `./x`, or `../x` resolves against the
   importing file's directory first.
2. **Bare alias** — a registered alias on its own resolves to that library's
   `main.libsonnet`.
3. **Alias plus file** — `alias/file` resolves `file` within the registered
   alias's tree; the alias head is authoritative.
4. **JPATH / vendor search** — the import path is searched across the snippet's
   own files and then every library, which is what lets an absolute
   `import 'github.com/grafana/grafonnet/gen/...'` resolve against a library
   whose tree carries the full vendor path.

Sibling files win over a library's default entry, matching `jsonnet -J vendor`.
A slash-prefixed path whose head is not a registered alias is not an error — it
falls through to the vendor search.

## Related pages

- [Snippet sources](/usage/snippet-sources/) — where a snippet's own Jsonnet
  comes from, including the same `sourceRef` mechanism libraries use.
- [Snippets and libraries](/usage/snippets-and-libraries/) — the on-disk
  equivalent for the HTTP renderer, including `--library-path` precedence.


---

# Logging

Source: https://jaas.projects.metio.wtf/usage/logging/


JaaS logs through Go's `log/slog`. Every request, reconcile, and lifecycle event
is a structured record you can filter and parse rather than scrape with a regex.
In operator mode, controller-runtime's own output — leader election, cache sync,
manager startup — flows through the **same** slog handler via the logr bridge
(`ctrl.SetLogger(logr.FromSlogHandler(...))`), so the manager's logs share the
configured level and format instead of emitting controller-runtime's default zap
output.

## The binary

Two flags control logging. They apply in every mode JaaS runs in:

- `--log-level` — `debug`, `info`, `warn`, or `error`. Default `info`.
- `--log-format` — `json` or `text`. Default `json`.

`json` emits one JSON object per line, the right choice for a log pipeline
(Loki, Elasticsearch, Cloud Logging) that indexes structured fields. `text`
emits human-readable key=value lines, handy when tailing logs at a terminal
during local development.

The full flag list with defaults is on the
[configuration page](/installation/configuration/).

### Reading logs

With the default JSON format, pipe `kubectl logs` through `jq`. Tail the
operator and pretty-print:

```shell
kubectl --namespace jaas logs deployment/jaas --follow | jq .
```

Filter to warnings and errors only:

```shell
kubectl --namespace jaas logs deployment/jaas | jq 'select(.level == "WARN" or .level == "ERROR")'
```

Follow a single snippet's reconciles by selecting on the logged fields:

```shell
kubectl --namespace jaas logs deployment/jaas | jq 'select(.namespace == "team-a" and .name == "dashboards")'
```

Turn `--log-level=debug` on temporarily to see per-request evaluation detail and
the operator's reconcile decisions; leave it at `info` in production to keep the
volume down.

## The Helm chart

The chart exposes both flags under `arguments`:

```yaml
arguments:
  # debug, info, warn, error
  logLevel: info
  # json, text
  logFormat: json
```

These map one-to-one onto `--log-level` and `--log-format`. Keep `logFormat:
json` for any cluster whose logs are ingested by a structured pipeline; switch
to `text` only for ad-hoc local clusters where a human reads the raw stream.


---

# Metrics

Source: https://jaas.projects.metio.wtf/usage/metrics/


The JaaS operator exposes a Prometheus metrics endpoint covering controller-runtime's
standard families plus a custom `jaas_*` family the reconciler registers. Scrape
it for dashboards and feed it into the shipped [alerts](/usage/alerting/).

## The binary

controller-runtime's Prometheus endpoint binds `--metrics-bind-address` (default
`:8083`), serving the standard text exposition format at `/metrics`. Setting it
to `0` disables the endpoint. The default deliberately avoids controller-runtime's
built-in `:8080`, which would collide with the Jsonnet HTTP port.

The full flag list with defaults is on the
[configuration page](/installation/configuration/).

### Metrics reference

The operator exports these custom `jaas_*` metrics, registered against
controller-runtime's registry so they ride the same `/metrics` endpoint:

| Metric | Type | Labels | Meaning |
|---|---|---|---|
| `jaas_snippet_reconcile_total` | counter | `namespace`, `name`, `status`, `reason` | One bump per reconcile that touches the Ready condition. `status` is `True`/`False`; `reason` is the Reason constant from the snippet's condition. |
| `jaas_snippet_rendered_bytes` | histogram | `namespace`, `name` | Rendered artifact size, observed only on `Synced` reconciles. Buckets run 256 B…64 MiB. |
| `jaas_snippet_rate_limited_total` | counter | `namespace`, `name` | Reconciles deferred by the per-snippet token bucket. Paired with the `RateLimited` Warning event. |
| `jaas_snippet_eval_unavailable_total` | counter | `namespace`, `name` | Reconciles deferred because the global concurrent-eval cap was full. Paired with the `EvalUnavailable` Warning event. |
| `jaas_snippet_force_drop_total` | counter | `namespace`, `name`, `reason` | Snippets whose finalizer was force-dropped because `Publisher.Withdraw` kept failing past `--max-withdraw-wait` or hit a permanent API error. `reason` names the trigger (`withdraw_timed_out`, `tenant_client_permanent`, `withdraw_permanent`). Sustained non-zero values mean orphaned tarballs are accumulating; see the [storage-recovery runbook](/runbooks/storage-recovery/). |
| `jaas_eval_in_flight` | gauge | — | Evaluations currently holding a slot in the global concurrent-eval semaphore. Reads through to the live count on every scrape. |
| `jaas_eval_max_concurrent` | gauge | — | Configured ceiling of the semaphore (`--max-concurrent-evals`). Zero means the gate is disabled — any saturation alert must guard on this being non-zero. |
| `jaas_eval_unavailable_total` | counter | — | Process-global accumulator of evaluations the semaphore rejected, across the HTTP and operator paths. Monotonic; resets on restart. |
| `jaas_eval_outstanding_timed_out` | gauge | — | Evaluation goroutines whose parent's context fired before the synchronous go-jsonnet call returned. Sustained non-zero readings flag a runaway snippet. |
| `jaas_storage_sweep_failures_total` | counter | — | Background storage-sweep passes that returned an error. The sweep removes orphaned `.tar.gz.tmp` residue; failures here don't block reconciles but let stale files accumulate. |
| `jaas_webhook_cert_renewal_failures_total` | counter | — | Self-signed cert renewal attempts that returned an error. Sustained non-zero values flag RBAC drift or a write-permission loss on `--webhook-cert-dir`; the existing cert's natural expiry is the deadline before admission breaks cluster-wide. |
| `jaas_tenant_token_mint_failures_total` | counter | `namespace`, `serviceAccount` | `TokenRequest` mints that returned an error. Sustained non-zero values on a pair indicate revoked `serviceaccounts/token: create` or a deleted namespace; affected snippets pin Ready=Unknown. |
| `jaas_crd_watch_engagement_failures_total` | counter | `gvk` | `EngageFluxWatch` calls that returned an error. Sustained non-zero values on a GVK mean dependent snippets won't re-render on upstream source events until the watch engages. |

The eval gauges (`jaas_eval_in_flight`, `jaas_eval_max_concurrent`,
`jaas_eval_outstanding_timed_out`) reflect the global concurrent-eval cap; see
[evaluation and security](/usage/evaluation-and-security/) for how that cap works
and how to size `--max-concurrent-evals`.

Alongside these, controller-runtime contributes its standard families for free —
`controller_runtime_reconcile_total`, `controller_runtime_reconcile_time_seconds`,
the `workqueue_*` series (depth, latency, retries), and the Go/process
collectors. The shipped [alerts](/usage/alerting/) build on both the `jaas_*`
metrics and these controller-runtime signals.

### Querying with PromQL

Once scraped, a few PromQL queries answer the common questions:

```promql
# Rate of failed reconciles per snippet, excluding healthy/intentional states.
sum by (namespace, name) (
  rate(jaas_snippet_reconcile_total{status="False",reason!~"Synced|Suspended|Pending"}[5m])
)

# Eval semaphore saturation, guarded on the gate being enabled.
jaas_eval_in_flight / jaas_eval_max_concurrent and jaas_eval_max_concurrent > 0

# p99 rendered artifact size per snippet.
histogram_quantile(0.99, sum by (namespace, name, le) (rate(jaas_snippet_rendered_bytes_bucket[30m])))
```

## The Helm chart

The metrics port is set under `ports`, and the chart wires it to a dedicated
`jaas-metrics` Service whenever the operator is enabled:

```yaml
ports:
  # controller-runtime metrics endpoint; maps to --metrics-bind-address.
  # Set to 0 in operator.metrics.enabled to disable entirely.
  metrics: 8083
```

Scraping is configured under `operator.metrics`. A `ServiceMonitor` for the
Prometheus Operator is opt-in — it selects the `jaas-metrics` Service and scrapes
its `metrics` port at `/metrics`:

```yaml
operator:
  enabled: true
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      scrapeTimeout: 10s
      # Labels your Prometheus instance selects ServiceMonitors on.
      labels:
        release: kube-prometheus
```

Without the Prometheus Operator, point a plain Prometheus scrape config at the
`jaas-metrics` Service (port `8083`, path `/metrics`), or add the usual
`prometheus.io/scrape` annotation set to the pod through `pod.additionalLabels`
and let a kubernetes-pods scrape job discover it.

To turn the alerts on, see [Alerting](/usage/alerting/).


---

# Network policy

Source: https://jaas.projects.metio.wtf/usage/network-policy/


The Helm chart ships an opt-in `NetworkPolicy` for the JaaS pod. It is off by
default and renders only when `networkPolicy.enabled` is `true`. Two independent
layers are on offer: pod-scoped allowlists that lock down only JaaS's own pods
(the safe default), and an additional namespace-wide default-deny for a zero-trust
namespace. The ingress and egress tables below describe exactly what traffic JaaS
depends on — in both renderer mode and [operator mode](/usage/operator-mode/) — so
everything else can be denied.

## Two layers: pod-scoped allowlists vs. namespace default-deny

`networkPolicy.enabled: true` renders per-workload, **pod-scoped allowlist**
policies. They select only JaaS's own pods through their `app.kubernetes.io/*`
labels and lock down just those pods to the required ports. This is the safe
default and is fine in a shared namespace: co-located workloads — including
anything in `flux-system` if JaaS shares that namespace — are untouched.

`networkPolicy.defaultDeny.enabled` (default `false`) **additionally** renders a
namespace-wide default-deny so every pod in the namespace is denied by default and
the allowlists become the only exceptions (a zero-trust namespace). The default-deny
sits at a lower precedence than the allowlists, so the allowlists always win for the
JaaS pods while everything else is denied.

Pick the layer that matches namespace ownership:

- **`defaultDeny.enabled: false`** (default) — pod-scoped setup. Only JaaS's pods
  are locked down; neighbours keep whatever posture their own policies give them.
- **`defaultDeny.enabled: true`** — namespace zero-trust. Enable this **only when
  JaaS owns its namespace**, because the deny-all also denies every co-located
  workload that does not have its own allowing policy.

`defaultDeny.order` (default `2000`) tunes the Calico `order` / ClusterNetworkPolicy
`priority` that keeps the deny-all subordinate to the allowlists. The `kubernetes`
and `cilium` engines have no precedence knob — deny and allow combine additively and
allow wins — so the value matters only for the `calico` and `clusterNetworkPolicy`
engines.

```yaml
networkPolicy:
  enabled: true
  defaultDeny:
    enabled: true   # only when JaaS owns this namespace
    order: 2000
```

## Choosing a policy engine

`networkPolicy.engine` selects which policy dialect the chart renders. It is
explicit, not auto-detected: a chart that sniffed the running CNI would render
different objects on different clusters from identical values, which breaks GitOps
determinism. You name the engine, and the rendered manifest is the same everywhere.

| `engine` | Renders | API | FQDN egress |
| --- | --- | --- | --- |
| `kubernetes` (default) | `NetworkPolicy` | `networking.k8s.io/v1` | No |
| `cilium` | `CiliumNetworkPolicy` | `cilium.io/v2` | Yes — free `toFQDNs` egress |
| `calico` | `NetworkPolicy` | `projectcalico.org/v3` | No — OSS Calico has no FQDN egress; that is Calico Enterprise only |
| `clusterNetworkPolicy` | `ClusterNetworkPolicy` | `policy.networking.k8s.io/v1alpha2` | No |

`clusterNetworkPolicy` renders the SIG-Network `ClusterNetworkPolicy` that
consolidates the deprecated `AdminNetworkPolicy` + `BaselineAdminNetworkPolicy`
APIs into one resource. It is alpha, cluster-scoped, and rendered in the `Baseline`
tier so a developer-authored `NetworkPolicy` still takes precedence over it.

```yaml
networkPolicy:
  enabled: true
  engine: cilium
```

The per-port `.from` knobs documented under [Configuring ingress](#configuring-ingress)
apply to the `kubernetes` engine only. For the other engines the allowlists are
pod-scoped allow-all on the required ports, and you tighten them through that
engine's native passthrough lists — `networkPolicy.<engine>.ingress` and
`networkPolicy.<engine>.egress` — which are merged verbatim into the rendered
policy's `spec`. For example, adding identity-based ingress and a `toFQDNs` egress
under the Cilium engine:

```yaml
networkPolicy:
  enabled: true
  engine: cilium
  cilium:
    ingress:
      - fromEndpoints:
          - matchLabels:
              app.kubernetes.io/name: kustomize-controller
    egress:
      - toFQDNs:
          - matchName: bucket.example.com
        toPorts:
          - ports:
              - port: "443"
                protocol: TCP
```

## Required traffic

The traffic JaaS needs depends on the mode it runs in. The renderer-mode rows apply
to every install; the operator-mode rows apply only when `operator.enabled` is
`true`.

### Ingress

| Port | Source | Mode | Selectable by label? |
|---|---|---|---|
| Jsonnet HTTP (`ports.http`, `8080`) | Callers of the `/jsonnet` endpoint, or an Ingress controller fronting the Service | always | Yes — or open when an Ingress fronts it |
| Management probes (`ports.management`, `8081`) | The kubelet, dialing the readiness, liveness, and startup probes from the node IP | always | No — the node IP is not a pod, so it cannot be a `podSelector` |
| Storage HTTP (`ports.storage`, `8082`) | The Flux consumers that dereference `ExternalArtifact` tarballs — kustomize-controller, helm-controller, and custom consumers such as stageset-controller | operator | Yes — by consumer namespace |
| Webhook (`ports.webhook`, `9443`) | The kube-apiserver, dialing the validating admission webhook | operator + webhook | No — the apiserver is not a pod |
| Metrics (`ports.metrics`, `8083`) | Prometheus scraping `/metrics` | operator + metrics | Yes — by the scraper's pod or namespace |

The Jsonnet HTTP and management ports always get an ingress rule. The storage,
webhook, and metrics ports each get their own rule when their mode is active —
storage when `operator.enabled`, webhook when the operator's webhook is enabled,
and metrics when the operator's metrics endpoint is enabled.

The kubelet and the apiserver source traffic from addresses that are not pods, so
their rules cannot be narrowed with a `podSelector` or `namespaceSelector`. Leaving
the management and webhook `from` lists empty keeps those ports reachable, which is
what lets probes succeed and the apiserver reach the webhook. Authenticity on the
webhook port is enforced by TLS and the CA bundle on the
`ValidatingWebhookConfiguration`, not by the network layer — see the
[admission webhook page](/usage/admission-webhook/).

### Egress

Egress only matters when you opt into it (`networkPolicy.egress.enabled`). The JaaS
operator needs the following outbound flows; in renderer mode it needs only DNS, if
that.

| Destination | Purpose | Mode | Selectable by label? |
|---|---|---|---|
| Cluster DNS | Name resolution — without it every other egress flow fails | always | Yes — by the DNS namespace |
| kube-apiserver | TokenRequest minting, CR reads, `ExternalArtifact` writes, leader election, and webhook caBundle patching | operator | No — `ipBlock` CIDR only |
| source-controller | Fetching upstream artifacts for snippets that use a `sourceRef` | operator | Yes — the `flux-system` namespace |
| S3 endpoint | Reading and writing tarballs when `storage.backend` is `s3` | operator + S3 | Depends — in-cluster MinIO is label-selectable; an external bucket is `ipBlock` only |
| OTLP collector | Shipping traces when `operator.tracing.endpoint` is set | operator + tracing | Depends — in-cluster collector is label-selectable; an external one is `ipBlock` only |

The kube-apiserver is never label-selectable, so its egress rule must be an
`ipBlock` CIDR. The same applies to any S3 bucket or OTLP collector that lives
outside the cluster.

## Configuring ingress

Under the `kubernetes` engine, enable the policy and tighten each port through its
`from` knob. An empty `from` list leaves that port open; a non-empty list restricts
it to the listed peers.

```yaml
networkPolicy:
  enabled: true
  # Open by default — typical when an Ingress fronts the Service. Set a
  # from-list to restrict callers of the /jsonnet endpoint.
  http:
    from: []
  # Leave empty — the kubelet probes source from the node IP.
  management:
    from: []
  # Leave empty — the kube-apiserver cannot be expressed as a podSelector.
  webhook:
    from: []
```

The storage port defaults to allowing any pod in `flux-system`, the namespace where
the stock Flux consumers run. Add an entry per extra consumer namespace — for
example a stageset-controller running in `stageset-system`:

```yaml
networkPolicy:
  enabled: true
  storage:
    from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: flux-system
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: stageset-system
```

The metrics port has its own ingress rule, rendered when both `operator.enabled` and
`operator.metrics.enabled` are set. Scope it to your monitoring namespace through
`networkPolicy.metrics.from`:

```yaml
networkPolicy:
  enabled: true
  metrics:
    from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: monitoring
```

Anything the per-port knobs do not cover goes into `additionalIngress`, which is
merged verbatim into the policy:

```yaml
networkPolicy:
  enabled: true
  additionalIngress:
    - ports:
        - protocol: TCP
          port: 8080
      from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx
```

## Opt-in egress

Egress is off by default, and deliberately so. Adding the `Egress` policy type
flips the JaaS pod to default-deny for outbound traffic — everything not explicitly
allowed is dropped. Getting the allow-list complete is the cluster operator's risk,
because the two destinations the operator needs most — the kube-apiserver and any
external S3 or OTLP endpoint — are not label-selectable and so depend on `ipBlock`
CIDRs that vary per cluster. An incomplete list does not fail loudly; it silently
cuts the operator off.

> **Warning:** Enabling egress **without** an `ipBlock` for the kube-apiserver cuts
> the operator off from the control plane. It can no longer mint tokens, read CRs,
> publish `ExternalArtifact` resources, hold the leader-election lease, or patch the
> webhook caBundle. Always include the apiserver CIDR before turning egress on.

Find the apiserver's address with:

```shell
kubectl --namespace default get endpoints kubernetes -o jsonpath='{.subsets[*].addresses[*].ip}'
```

Use that IP as a `/32` (or your control plane's CIDR for an HA apiserver). A
complete operator egress block — DNS, the apiserver, source-controller, S3, and an
OTLP collector — looks like this:

```yaml
networkPolicy:
  enabled: true
  egress:
    enabled: true
    # DNS to the cluster DNS namespace. Without this, every flow below
    # fails name resolution.
    dns: true
    dnsNamespace: kube-system
    to:
      # kube-apiserver — not label-selectable, so an ipBlock CIDR.
      # Replace with the IP(s) from the command above.
      - to:
          - ipBlock:
              cidr: 10.0.0.1/32
        ports:
          - protocol: TCP
            port: 443
      # source-controller — fetching upstream artifacts for sourceRef snippets.
      - to:
          - namespaceSelector:
              matchLabels:
                kubernetes.io/metadata.name: flux-system
      # S3 bucket — an ipBlock CIDR for an external endpoint. For in-cluster
      # MinIO, use a namespaceSelector instead.
      - to:
          - ipBlock:
              cidr: 203.0.113.0/24
        ports:
          - protocol: TCP
            port: 443
      # OTLP collector — an ipBlock CIDR for an external endpoint. For an
      # in-cluster collector, use a namespaceSelector instead.
      - to:
          - ipBlock:
              cidr: 198.51.100.10/32
        ports:
          - protocol: TCP
            port: 4317
```

Trim this to what your install actually uses: drop the S3 block on the local storage
backend, and drop the OTLP block when [tracing](/usage/observability/) is off. The
apiserver and DNS rules are non-negotiable for the operator. Storage destinations
are covered on the [storage and high availability page](/usage/storage-and-ha/), and
tenancy on the [tenancy and RBAC page](/usage/tenancy-and-rbac/).

For the full set of chart values, see the
[chart README](https://github.com/metio/helm-charts/tree/main/charts/jaas).


---

# Observability

Source: https://jaas.projects.metio.wtf/usage/observability/


JaaS gives you four ways to see what it is doing in a cluster. Structured logs
tell you what happened on a single request or reconcile; traces follow one
operation across its spans; metrics aggregate behaviour into time series for
dashboards and alerts; and alerts plus Kubernetes Events turn a sustained
problem into a page or a notification.

Each pillar has its own page covering both the binary's flags and the Helm chart
keys that drive them:

- [Logging](/usage/logging/) — the `log/slog` logger, `--log-level` and
  `--log-format`, and reading JSON logs with `kubectl logs` and `jq`.
- [Tracing](/usage/tracing/) — OTLP gRPC export to an OpenTelemetry collector,
  sampling, and viewing spans.
- [Metrics](/usage/metrics/) — the Prometheus endpoint, the custom `jaas_*`
  metric family, scraping with a `ServiceMonitor`, and querying with PromQL.
- [Alerting](/usage/alerting/) — the opt-in `PrometheusRule` alert catalog with
  its runbook links, plus Kubernetes Events routed through Flux's
  notification-controller.

Logging applies to every mode JaaS runs in. Tracing, metrics, and alerting are
operator-mode concerns and take effect once `--enable-flux-integration` is set
(`operator.enabled` in the chart).


---

# Operator mode

Source: https://jaas.projects.metio.wtf/usage/operator-mode/


JaaS runs as a Kubernetes operator alongside its HTTP renderer. In this mode it
watches custom resources, evaluates the Jsonnet they describe, and publishes the
result as a Flux [`ExternalArtifact`](https://fluxcd.io/) that downstream
controllers consume. The HTTP renderer keeps running; the operator is an
additional set of goroutines, not a separate binary.

## Enabling the operator

Set `--enable-flux-integration` on the binary:

```shell
jaas --enable-flux-integration \
  --storage-path=/var/lib/jaas/artifacts \
  --storage-base-url=http://jaas-storage.jaas.svc:8082
```

`--storage-path` and `--storage-base-url` are required in operator mode — they
tell the operator where to write artifact tarballs and the public URL prefix
downstream consumers fetch them from.

With the [Helm chart](/installation/) set `operator.enabled: true`:

```yaml
operator:
  enabled: true
```

The chart wires the storage paths, leader election, RBAC, and the metrics
Service for you.

## The two custom resources

The operator watches two CRDs in the `jaas.metio.wtf/v1` API group. Both are
namespaced.

| Kind             | Scope      | Purpose                                                                          |
|------------------|------------|----------------------------------------------------------------------------------|
| `JsonnetSnippet` | Namespaced | A Jsonnet snippet to evaluate and publish as an `ExternalArtifact`.              |
| `JsonnetLibrary` | Namespaced | Reusable `.libsonnet` files that snippets in the same namespace import.          |

A `JsonnetSnippet` is the published unit. A `JsonnetLibrary` carries no artifact
of its own — it exists to be imported by snippets. The full field reference for
each lives at [/api/jsonnetsnippet/](/api/jsonnetsnippet/); the library CRD is
covered in [Jsonnet libraries](/usage/jsonnet-libraries/).

## What the operator produces

Each reconcile of a `JsonnetSnippet` evaluates the snippet and writes the result
into a tar.gz, then upserts a Flux `ExternalArtifact` CR whose
`status.artifact.url` points at the operator's storage HTTP server. In the
default `rendered` output mode the archive holds a single `rendered.json` — the
evaluated JSON. The published artifact's URL is also mirrored onto the snippet's
own `status.artifactURL`, so `kubectl describe jsonnetsnippet` answers "where is
my rendered output?" without a second lookup.

Any controller that understands Flux's `ExternalArtifact` reads the result by
pointing a `sourceRef` at it:

```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: consume-rendered
  namespace: default
spec:
  sourceRef:
    kind: ExternalArtifact
    name: hello-world
```

Real consumers of the published artifact include:

- [grafana-operator](https://grafana.github.io/grafana-operator/) — renders
  Grafana dashboards from the evaluated JSON.
- [stageset-controller](https://stageset.projects.metio.wtf/) — drives a staged
  rollout of the rendered manifests.
- Flux's own `kustomize-controller` and `helm-controller`, which apply the
  rendered output as part of a GitOps pipeline.

## A minimal snippet

The simplest `JsonnetSnippet` carries its Jsonnet inline in `spec.files` and
seeds two external variables:

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: hello-world-tenant
  namespace: default
---
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: hello-world
  namespace: default
spec:
  serviceAccountName: hello-world-tenant
  entryFile: main.jsonnet
  files:
    main.jsonnet: |
      {
        greeting: 'hello',
        recipient: std.extVar('audience'),
        timestamp: std.extVar('now'),
      }
  externalVariables:
    audience: world
    now: "2026-06-09T12:00:00Z"
```

`spec.serviceAccountName` names the ServiceAccount the operator impersonates for
every API call this snippet drives — the artifact write, source fetches, library
reads. That ServiceAccount's RBAC, not the operator's, governs what the snippet
can reach. See [Tenancy and RBAC](/usage/tenancy-and-rbac/) for the verbs the
tenant ServiceAccount needs.

## Lifecycle knobs

Two `spec` fields control when and whether the operator reconciles a snippet.
Both mirror Flux's source-controller conventions.

### `spec.suspend`

Set `spec.suspend: true` to pause reconciliation without deleting the snippet.
The operator skips the evaluation pipeline, leaves the existing
`ExternalArtifact` in place, and reports `Ready=False` with reason `Suspended`.
Setting it back to `false` resumes reconciliation. The published artifact stays
available the whole time, so downstream consumers keep reading the last rendered
output while the snippet is paused.

```yaml
spec:
  suspend: true
```

### `spec.interval`

Set `spec.interval` to re-render the snippet on a fixed cadence even when no
watch event fires:

```yaml
spec:
  interval: 10m
```

A `JsonnetSnippet` re-renders whenever its source, libraries, or referenced Flux
sources change. `spec.interval` adds a steady-state cadence on top of that, so
the snippet picks up state outside the watched graph — external-variable
environment drift on the operator pod, OCI library refreshes, and similar. The
interval is bounded at admission to between `30s` and `24h`. Failed reconciles
still use controller-runtime's exponential backoff; the interval governs only
the steady-state cadence.

## Where to go next

- [Snippet sources](/usage/snippet-sources/) — inline files, a Flux `sourceRef`,
  multi-snippet trees, and chaining one snippet's output into another.
- [Jsonnet libraries](/usage/jsonnet-libraries/) — the `JsonnetLibrary` CRD,
  OCI-mounted shared libraries, and how imports resolve.
- [Tenancy and RBAC](/usage/tenancy-and-rbac/) — per-snippet impersonation and
  the tenant ServiceAccount's permissions.
- [Storage and HA](/usage/storage-and-ha/) — the local and S3 backends, leader
  election, and revision retention.
- [/api/jsonnetsnippet/](/api/jsonnetsnippet/) — the exhaustive field-by-field
  reference.


---

# Rendering endpoint

Source: https://jaas.projects.metio.wtf/usage/rendering-endpoint/


Send a `GET` to the rendering endpoint with a snippet name and JaaS returns the
evaluated Jsonnet as JSON:

```shell
curl http://127.0.0.1:8080/jsonnet/example1
```

The Jsonnet server binds `127.0.0.1:8080` by default (`--listen-address`,
`--port`). The URL shape is `GET /<jsonnet-endpoint-path>/{snippet...}`, where
`{snippet...}` is a trailing path segment that may contain slashes. A successful
response carries `Content-Type: application/json` and the rendered document.

## The endpoint path

The leading path segment defaults to `jsonnet` and is set with
`--jsonnet-endpoint-path`. Running with `--jsonnet-endpoint-path render` moves the
endpoint to `GET /render/{snippet...}`:

```shell
./jaas --jsonnet-endpoint-path render --snippet-directory examples/snippets/dashboards
curl http://127.0.0.1:8080/render/example1
```

## Snippet resolution

The `{snippet...}` segment names which file JaaS evaluates. Resolution checks
the `--snippet` files first, then looks for `<name>/main.jsonnet` under each
`--snippet-directory`. See
[Snippets and libraries](/usage/snippets-and-libraries/) for how to declare
both.

Resolution is sandboxed through Go's `os.Root`, which rejects `..` traversal and
symlinks that escape the configured directory. A crafted URL never reaches a
file outside the snippet roots:

```shell
curl -i http://127.0.0.1:8080/jsonnet/../etc/passwd
# HTTP/1.1 404 Not Found
```

## Management probes

A second HTTP server — the management server — exposes the Kubernetes lifecycle
probes. It binds `127.0.0.1:8081` by default (`--management-listen-address`,
`--management-port`):

| Path     | Meaning                                                                      |
|----------|------------------------------------------------------------------------------|
| `/live`  | Liveness. Unconditional `200`.                                               |
| `/start` | Startup. Consults health state; `200` once started, otherwise `503` + JSON.  |
| `/ready` | Readiness. Consults health state; `200` when ready, otherwise `503` + JSON.  |

A not-ready probe returns a JSON body naming the state:

```shell
curl -i http://127.0.0.1:8081/ready
# HTTP/1.1 503 Service Unavailable
# {"status":"not ready"}
```

## Error contract

Every non-2xx response carries a JSON body with `Content-Type: application/json`
so programmatic callers can pick the failure apart:

```json
{
  "error":   "snippet_not_found",
  "message": "snippet \"missing\" not found",
  "snippet": "missing"
}
```

The `error` field is a stable identifier — callers match on it, and these
strings do not change. The `message` field carries human-readable detail. The
`snippet` field echoes the requested name when one was parsed, and is omitted
otherwise.

| `error`                  | HTTP status | When                                                          |
|--------------------------|------------:|---------------------------------------------------------------|
| `method_not_allowed`     | `405`       | Anything other than `GET` on the endpoint.                    |
| `snippet_not_found`      | `404`       | The requested snippet name resolves to no file.              |
| `evaluation_timeout`     | `504`       | Evaluation exceeded `--evaluation-timeout`.                    |
| `evaluation_unavailable` | `503`       | The concurrent-eval cap (`--max-concurrent-evals`) is full.   |
| `evaluation_failed`      | `400`       | go-jsonnet returned an error (syntax, missing import, stack-limit exceeded). |

For `evaluation_failed`, `message` is the raw go-jsonnet diagnostic, including
the file and line numbers from the snippet on disk. That diagnostic can name
on-disk paths, so treat it as cluster-internal detail.

A client that closes the connection mid-evaluation receives no body and no
status line — the handler detects the cancellation and returns without writing
anything.

The timeout, stack, and concurrency caps that drive `evaluation_timeout` and
`evaluation_unavailable` are documented in
[Evaluation and security](/usage/evaluation-and-security/). To pass values into
a render, see
[External variables and TLAs](/usage/external-variables-and-tlas/).


---

# Service mesh

Source: https://jaas.projects.metio.wtf/usage/service-mesh/


The Helm chart ships an opt-in service-mesh authorization layer for the JaaS pod.
It is off by default and renders only when `serviceMesh.enabled` is `true`. Where
the [network policy](/usage/network-policy/) operates at L3/L4 — which pods and IP
ranges may reach which ports — the service mesh operates at L7 with **identity-based
authorization** and **mTLS**: it asks *which mesh identity* is calling, proven by a
cryptographic workload certificate, not merely which IP the packet came from.

`serviceMesh` and `networkPolicy` are separate fields and compose additively. They
solve different problems and are best enabled together: the network policy draws the
L3/L4 perimeter, and the mesh authorizes meshed callers by SPIFFE identity on top of
it. Enabling one does not require the other, and neither weakens the other.

## Opt-in and explicit engine

`serviceMesh.engine` selects which mesh dialect the chart renders. It is explicit,
not auto-detected: a chart that sniffed the running mesh would render different
objects on different clusters from identical values, which breaks GitOps determinism.
You name the engine, and the rendered manifest is the same everywhere.

| `engine` | Renders | API |
| --- | --- | --- |
| `istio` (default) | `AuthorizationPolicy` + (optional) `PeerAuthentication` | `security.istio.io/v1` |
| `linkerd` | `Server` + `AuthorizationPolicy` + `MeshTLSAuthentication` | `policy.linkerd.io` |

The rendered objects are inert unless the named mesh is actually installed and the
JaaS pod is injected into it. Enabling `serviceMesh` on an un-meshed pod renders the
manifests but changes nothing about how traffic flows.

```yaml
serviceMesh:
  enabled: true
  engine: istio
```

## Per-port authorization

Each mesh-reachable port carries a `from` list naming the mesh identities allowed to
call it. An **empty `from` list leaves that port open** to any meshed caller,
mirroring `networkPolicy`'s empty-`from`-is-open semantics; a non-empty list
restricts the port to the listed identities.

The mesh-reachable ports are the Jsonnet HTTP port, the storage port, and the metrics
port:

| Port | Mode | `from` restricts |
|---|---|---|
| Jsonnet HTTP (`ports.http`, `8080`) | always | Callers of the `/jsonnet` endpoint |
| Storage HTTP (`ports.storage`, `8082`) | operator | Flux consumers that dereference `ExternalArtifact` tarballs |
| Metrics (`ports.metrics`, `8083`) | operator + metrics | Prometheus scraping `/metrics` |

A `from` entry is a source matcher with two fields:

- **`principals`** — SPIFFE/mesh identities. On Istio these are
  `source.principals` (`cluster.local/ns/<ns>/sa/<sa>`). On Linkerd they map to
  `MeshTLSAuthentication` identities (`<sa>.<ns>.serviceaccount.identity.linkerd.cluster.local`,
  or `*`).
- **`namespaces`** — source namespaces (Istio `source.namespaces`). **Istio-only** —
  Linkerd authenticates by workload identity, not by namespace, so this field is
  ignored under the `linkerd` engine.

```yaml
serviceMesh:
  enabled: true
  engine: istio
  # Restrict the storage port to the kustomize-controller's identity.
  storage:
    from:
      - principals:
          - cluster.local/ns/flux-system/sa/kustomize-controller
          - cluster.local/ns/flux-system/sa/helm-controller
  # Scope metrics scraping to the monitoring namespace.
  metrics:
    from:
      - namespaces:
          - monitoring
  # Leave http open so any meshed caller can reach the renderer.
  http:
    from: []
```

## Non-mesh clients keep working

The kube-apiserver (which dials the admission webhook) and the kubelet (which dials
the readiness, liveness, and startup probes) are **not part of the mesh**. They carry
no mesh identity and present no workload certificate, so any authorization rule that
demanded one would reject them — admission would break and probes would fail.

The chart deliberately leaves the **webhook and management ports open** so these
non-mesh clients always connect:

- **Istio** adds an allow-any rule covering the webhook (`9443`) and management
  (`8081`) ports, so no identity is required on them.
- With **`mtls: strict`**, the chart additionally sets `portLevelMtls: PERMISSIVE` on
  those two ports, so a plaintext connection from the apiserver or kubelet is still
  accepted even while every other port enforces mTLS.
- **Linkerd** renders no `Server` for those ports, leaving them outside the mesh's
  authorization scope entirely.

This is why admission and probes keep working with the mesh fully enabled.
Authenticity on the webhook port is enforced by TLS and the CA bundle on the
`ValidatingWebhookConfiguration`, not by the mesh — see the
[admission webhook page](/usage/admission-webhook/).

## mTLS

`serviceMesh.mtls` sets the mTLS posture and applies to the **Istio engine only**;
Linkerd negotiates mTLS automatically between meshed pods, so the knob is ignored
there.

| `mtls` | Effect (Istio) |
| --- | --- |
| `""` (default) | Defers to the mesh's own default (mesh-wide `PeerAuthentication` / `MeshConfig`) — no `PeerAuthentication` is rendered |
| `permissive` | Renders a `PeerAuthentication` accepting both mTLS and plaintext |
| `strict` | Requires mTLS on the workload's ports, **except** the webhook and management ports, which get a port-level `PERMISSIVE` carve-out |

```yaml
serviceMesh:
  enabled: true
  engine: istio
  mtls: strict
```

Under `strict`, every mesh-reachable port enforces mTLS while the apiserver and
kubelet still reach the webhook and probes over plaintext via the carve-out above.

## Default-deny

`serviceMesh.defaultDeny.enabled` (default `false`) additionally renders a
namespace-wide default-deny so every pod in the install namespace rejects
unauthorized mesh traffic and the per-workload allows become the only exceptions (a
zero-trust namespace).

- **Istio** renders an empty-spec `AuthorizationPolicy` (deny-all) scoped to the
  whole namespace; it sits at lower precedence than the workload `ALLOW`, so the
  per-port allows always win for the JaaS pod while everything else is denied.
- **Linkerd** has no per-object deny-all; the namespace default is set via the
  `config.linkerd.io/default-inbound-policy` annotation, stamped onto the
  chart-managed Namespace (requires `namespace.create=true`) — otherwise annotate the
  namespace out of band.

Enable this **only when JaaS owns its namespace**, because the deny-all also denies
every co-located workload that does not have its own allowing authorization.

```yaml
serviceMesh:
  enabled: true
  engine: istio
  defaultDeny:
    enabled: true   # only when JaaS owns this namespace
```

## Native passthrough

Anything the per-port `from` knobs cannot express goes into the engine's native
passthrough list, merged verbatim into the rendered objects.

Under the `istio` engine, `serviceMesh.istio.rules` are merged into the
`AuthorizationPolicy`'s `spec.rules` (the `security.istio.io/v1` rule schema) — use it
for path/method matchers, `when` JWT-claim conditions, `ipBlocks`, and similar:

```yaml
serviceMesh:
  enabled: true
  engine: istio
  istio:
    rules:
      - from:
          - source:
              requestPrincipals:
                - "https://accounts.example.com/*"
        to:
          - operation:
              paths:
                - /jsonnet/*
```

Under the `linkerd` engine, `serviceMesh.linkerd.authorizations` are appended verbatim
as additional documents after the rendered `Server` / `AuthorizationPolicy` /
`MeshTLSAuthentication` set — each entry must be a complete `policy.linkerd.io` object:

```yaml
serviceMesh:
  enabled: true
  engine: linkerd
  linkerd:
    authorizations:
      - apiVersion: policy.linkerd.io/v1beta3
        kind: AuthorizationPolicy
        metadata:
          name: jaas-extra
        spec:
          targetRef:
            group: policy.linkerd.io
            kind: Server
            name: jaas-http
          requiredAuthenticationRefs:
            - kind: ServiceAccount
              name: custom-caller
              namespace: tenant-a
```

## Which traffic gets authorized

The mesh authorizes only meshed callers on the mesh-reachable ports; the non-mesh
ports stay open by design so the control plane keeps working.

| Port | Authorized by the mesh? | Why |
|---|---|---|
| Jsonnet HTTP (`8080`) | Yes — `serviceMesh.http.from` | Meshed callers of the renderer |
| Storage HTTP (`8082`) | Yes — `serviceMesh.storage.from` | Meshed Flux consumers |
| Metrics (`8083`) | Yes — `serviceMesh.metrics.from` | Meshed Prometheus scrapers |
| Webhook (`9443`) | No — open carve-out | The kube-apiserver is not in the mesh |
| Management probes (`8081`) | No — open carve-out | The kubelet is not in the mesh |

For the full set of chart values, see
[Helm chart values](/installation/helm-values/).


---

# Snippet sources

Source: https://jaas.projects.metio.wtf/usage/snippet-sources/


A `JsonnetSnippet` declares exactly one source for its Jsonnet bytes: either
inline `spec.files` or a `spec.sourceRef` pointing at a Flux source. Admission
rejects a snippet that sets both or neither. The operator resolves the source
into an in-memory file tree, evaluates `spec.entryFile` within it, and publishes
the result.

## Inline files

`spec.files` is a map of filename to Jsonnet source. The operator evaluates the
entry file (`spec.entryFile`, default `main.jsonnet`) against the rest of the
map. This is the simplest source — the snippet is self-contained, with no
external dependency to fetch:

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: hello-world
  namespace: default
spec:
  serviceAccountName: hello-world-tenant
  entryFile: main.jsonnet
  files:
    main.jsonnet: |
      {
        greeting: 'hello',
        recipient: std.extVar('audience'),
      }
  externalVariables:
    audience: world
```

## A Flux source

`spec.sourceRef` points at a Flux source CR whose artifact tarball the operator
fetches and extracts into the snippet's file tree. The `kind` is one of
`GitRepository`, `OCIRepository`, `Bucket`, or `ExternalArtifact` — see Flux's
[source-controller documentation](https://fluxcd.io/) for how each source CR
publishes its artifact.

When the referenced source republishes — a new commit lands on the
`GitRepository`, a new tag pushes to the `OCIRepository` — the operator's watch
on Flux source kinds re-queues the snippet and re-renders it. No `spec.interval`
is required for this; the watch is event-driven.

```yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: dashboards-source
  namespace: default
spec:
  interval: 5m
  url: https://github.com/example-org/grafana-dashboards
  ref:
    branch: main
---
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: api-latency-dashboard
  namespace: default
spec:
  serviceAccountName: dashboards-tenant
  entryFile: dashboards/api-latency.jsonnet
  sourceRef:
    kind: GitRepository
    name: dashboards-source
    path: dashboards/
```

`spec.sourceRef.path` narrows extraction to a subdirectory of the artifact's
tarball. Empty means the whole tree. The tenant ServiceAccount needs `get` on
the referenced source kind — see [Tenancy and RBAC](/usage/tenancy-and-rbac/).

### The entry file and multi-snippet trees

`spec.entryFile` names the file — relative to the resolved source root — that
go-jsonnet evaluates. It defaults to `main.jsonnet`. The field is restricted to
relative `[A-Za-z0-9._/-]+` paths with no `..` segments, so it cannot traverse
out of the extracted tree.

One Flux source often carries many snippets. A shared dashboards repository, for
example, holds one `.jsonnet` file per dashboard. Rather than one source per
dashboard, point several `JsonnetSnippet` resources at the same `GitRepository`
and give each a different `entryFile`:

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: api-latency-dashboard
  namespace: default
spec:
  serviceAccountName: dashboards-tenant
  entryFile: dashboards/api-latency.jsonnet
  sourceRef:
    kind: GitRepository
    name: dashboards-source
---
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: error-budget-dashboard
  namespace: default
spec:
  serviceAccountName: dashboards-tenant
  entryFile: dashboards/error-budget.jsonnet
  sourceRef:
    kind: GitRepository
    name: dashboards-source
```

Both snippets share the source fetch and re-render together when the repository
republishes, but each publishes its own `ExternalArtifact` from its own entry
file.

## Chaining snippets

A `JsonnetSnippet` can source from the `ExternalArtifact` another snippet
publishes. This composes a pipeline of renders: snippet A evaluates and
publishes its JSON, and snippet B takes that JSON as its input, transforms it,
and publishes a second artifact. A downstream consumer deploys only the final
artifact.

Chaining works because the `ExternalArtifact` is a Flux source like any other.
Snippet B sets `spec.sourceRef` with `kind: ExternalArtifact` and `name` pointing
at the producing snippet — an `ExternalArtifact` is published under the producing
`JsonnetSnippet`'s name. The operator fetches A's artifact tarball into B's file
tree. In the default `rendered` output mode that tarball holds a single
`rendered.json`, so snippet B sets `entryFile: rendered.json` to evaluate A's
output. Because JSON is valid Jsonnet, B's entry file can extend the imported
object directly:

```yaml
# Snippet A renders a shared config blob other snippets consume.
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: base-config
  namespace: default
spec:
  serviceAccountName: chained-tenant
  entryFile: main.jsonnet
  files:
    main.jsonnet: |
      {
        cluster: 'prod',
        region: 'eu-west-1',
        retentionDays: 30,
      }
---
# Snippet B sources from base-config's ExternalArtifact and extends it.
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: derived-config
  namespace: default
spec:
  serviceAccountName: chained-tenant
  entryFile: rendered.json
  sourceRef:
    kind: ExternalArtifact
    name: base-config
```

`derived-config` re-emits `base-config`'s rendered JSON as its own artifact. The
operator's watch on `ExternalArtifact` updates re-queues `derived-config`
whenever `base-config` republishes, so the pipeline stays current end to end.

### Source variant

The rendered variant above passes evaluated JSON downstream. The source variant
passes raw Jsonnet downstream instead, for snippet B to re-evaluate itself.

Snippet A sets `spec.output: source`, so its `ExternalArtifact` carries A's raw
`.jsonnet` / `.libsonnet` files rather than the evaluated JSON. Snippet B points
`spec.sourceRef` at A's `ExternalArtifact` and imports A's files as Jsonnet,
re-evaluating them with B's own external variables, TLAs, and libraries. A
becomes a source that the pipeline produces dynamically rather than one an
operator authors by hand:

```yaml
# Snippet A publishes its raw Jsonnet, not its evaluated output.
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: dashboard-template
  namespace: default
spec:
  serviceAccountName: chained-tenant
  output: source
  entryFile: main.jsonnet
  files:
    main.jsonnet: |
      function(env='dev') {
        title: 'API latency — ' + env,
        refresh: if env == 'prod' then '30s' else '5m',
      }
---
# Snippet B sources A's raw Jsonnet and re-evaluates it with its own TLAs.
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: dashboard-prod
  namespace: default
spec:
  serviceAccountName: chained-tenant
  entryFile: main.jsonnet
  sourceRef:
    kind: ExternalArtifact
    name: dashboard-template
  tlas:
    env:
      - prod
```

Because A published `source` output, B's `sourceRef` extracts A's raw
`main.jsonnet` into B's file tree, and B evaluates it as the entry file with
`env=prod` supplied as a TLA. When A's template changes, A republishes and the
`ExternalArtifact` watch re-renders B against the new Jsonnet.

Choose between the two by what the downstream snippet needs: rendered chaining
passes JSON data downstream; source chaining passes Jsonnet to be re-evaluated
downstream. For when to reach for a `source`-output snippet instead of a
`JsonnetLibrary`, see
[JsonnetLibrary vs a source-output snippet](/usage/jsonnet-libraries/#jsonnetlibrary-vs-a-source-output-snippet).

The tenant ServiceAccount needs `get` on
`externalartifacts.source.toolkit.fluxcd.io` for both variants; see
[Tenancy and RBAC](/usage/tenancy-and-rbac/).

### Cycle detection

A snippet cannot transitively depend on itself. The operator walks the
dependency graph — `spec.sourceRef` edges to `ExternalArtifact`s and their
producing snippets, plus `spec.libraries` edges through `JsonnetLibrary`
`sourceRef`s — at reconcile time, before any tenant work. If the walk revisits
the snippet it started from, the operator refuses to publish and reports
`Ready=False` with reason `DependencyCycle`. This catches a chain that feeds
back into itself directly or through a library, so a cycle surfaces as a clear
status condition rather than an endless re-render loop.

## Related pages

- [Jsonnet libraries](/usage/jsonnet-libraries/) — reusable `.libsonnet` files
  referenced via `spec.libraries`.
- [Tenancy and RBAC](/usage/tenancy-and-rbac/) — the verbs the tenant
  ServiceAccount needs for each source kind.


---

# Snippets and libraries

Source: https://jaas.projects.metio.wtf/usage/snippets-and-libraries/


In renderer mode you declare snippets and libraries on disk through command-line
flags. A snippet becomes reachable at the
[rendering endpoint](/usage/rendering-endpoint/); a library is importable by any
snippet.

## Directory snippets

Point `--snippet-directory` at a directory whose subdirectories each hold a
`main.jsonnet`. Each subdirectory name becomes a snippet name:

```shell
./jaas --snippet-directory examples/snippets/dashboards
curl http://127.0.0.1:8080/jsonnet/example1
```

Given this layout, `example1` resolves to
`examples/snippets/dashboards/example1/main.jsonnet`:

```text
examples/snippets/dashboards
├── example1
│   └── main.jsonnet
├── tla-example
│   └── main.jsonnet
└── multi-tla
    └── main.jsonnet
```

## File snippets

Point `--snippet` at an individual Jsonnet file. The file path becomes the
snippet name:

```shell
./jaas --snippet examples/snippets/example.jsonnet
curl http://127.0.0.1:8080/jsonnet/examples/snippets/example.jsonnet
```

Both `--snippet` and `--snippet-directory` are repeatable, so one process serves
several roots:

```shell
./jaas \
  --snippet-directory examples/snippets/dashboards \
  --snippet examples/snippets/example.jsonnet
```

## Libraries

Point `--library-path` at a directory that holds importable Jsonnet libraries. A
snippet imports a library by its path under that directory:

```text
examples/libraries
├── examplonet
│   └── main.libsonnet
└── text
    └── welcome.txt
```

```shell
./jaas \
  --snippet-directory examples/snippets/dashboards \
  --library-path examples/libraries
```

A snippet then imports the library with a string-literal path:

```jsonnet
local examplonet = import 'examplonet/main.libsonnet';

{
  person1: {
    name: examplonet.standard,
    welcome: 'Hello ' + self.name + '!',
  },
}
```

`--library-path` is repeatable. When the same import path matches under more than
one library directory, the rightmost matching directory wins — list the override
directory last.

## Embedding non-Jsonnet files

Use `importstr` to pull the raw contents of a file under a library path into a
snippet as a string. The `embed-text` example reads a text file from the
`text/` library:

```jsonnet
{
  banner: importstr 'text/welcome.txt',
  length: std.length(self.banner),
}
```

Any file reachable under a `--library-path` directory or the snippet's own
directory can be `import`-ed or `importstr`-ed by any snippet. Scope these
directories tightly — see
[Evaluation and security](/usage/evaluation-and-security/).

For the operator-side equivalent — `JsonnetLibrary` CRDs and OCI-mounted shared
libraries — see [Jsonnet libraries](/usage/jsonnet-libraries/).


---

# Storage and high availability

Source: https://jaas.projects.metio.wtf/usage/storage-and-ha/


In [operator mode](/usage/operator-mode/) JaaS renders each `JsonnetSnippet` into
a tar.gz artifact, stores it, and publishes an `ExternalArtifact` CR that points a
Flux consumer at the tarball over HTTP. JaaS publishes artifacts through one of two
storage backends — local filesystem or S3-compatible object storage — with optional
leader election for multi-replica high availability and configurable revision
retention.

## Serving the tarballs

Regardless of backend, the operator runs an HTTP server that Flux consumers fetch
artifacts from. Three flags govern it, and `--storage-base-url` and
`--storage-path` are required whenever `--enable-flux-integration` is set:

- `--storage-base-url` — the public URL prefix stamped into each
  `ExternalArtifact`'s `status.artifact.url`. This is what downstream Flux
  controllers dial, so it must be reachable from them.
- `--storage-listen-address` (default `0.0.0.0`) and `--storage-port` (default
  `8082`) — the bind address of the storage HTTP server.

## Local backend

`--storage-backend=local` (the default) writes tarballs to the filesystem under
`--storage-path`. The Helm chart pairs this with an `emptyDir` by default, or a
`PersistentVolumeClaim` when `operator.storage.persistence.enabled: true`.

A ReadWriteOnce PVC caps the install at a single replica, because only one pod can
mount the volume for writing. If you need more than one replica, use the S3
backend below.

## S3 backend

`--storage-backend=s3` stores tarballs in any S3-compatible bucket (AWS S3, MinIO,
Ceph RGW, Backblaze B2, and similar). The bucket must already exist. Configure it
with:

| Flag | Purpose |
|---|---|
| `--s3-endpoint` | S3 host:port, e.g. `s3.amazonaws.com` or `minio.minio.svc:9000`. Required. |
| `--s3-bucket` | Bucket the artifacts live in. Required. |
| `--s3-prefix` | Optional key prefix so JaaS can coexist with other tenants in one bucket. |
| `--s3-region` | Region the bucket lives in. Required for AWS multi-region; ignored by most other servers. |
| `--s3-use-ssl` | Talk HTTPS to the endpoint (default `true`). Set `false` only for local MinIO over HTTP. |
| `--s3-access-key` | Static access key ID. |
| `--s3-secret-key` | Static secret access key, paired with `--s3-access-key`. |
| `--s3-session-token` | Optional session token for temporary credentials. |
| `--s3-anonymous` | Skip request signing entirely; only for a public bucket, test and dev only. |

Leave `--s3-access-key` and `--s3-secret-key` empty to engage the IAM/IRSA
discovery chain — environment credentials, web-identity tokens, and EC2/EKS
instance metadata — so a pod running with an IRSA-annotated ServiceAccount needs
no static keys.

### Bring your own Secret

The chart never bakes credentials into a rendered Secret. It references a Secret
you provide by name (`operator.storage.s3.credentialsSecret.name`) and consumes it
via `envFrom`, expecting the keys `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and
the optional `AWS_SESSION_TOKEN`. The Secret's provenance is yours to choose.

Any of these can produce that Secret, and the chart works with all of them
unchanged — point the tool at the same name the chart references:

- **External Secrets Operator** — an `ExternalSecret` that syncs from Vault, AWS
  Secrets Manager, GCP Secret Manager, or Azure Key Vault into a Secret of that
  name.
- **Sealed Secrets** — a `SealedSecret` that the controller decrypts in-cluster.
- **Vault Agent / CSI** — a Secret materialized from Vault.
- **SOPS** — a Secret decrypted by your GitOps tooling at apply time.
- **`kubectl create secret`** — a plain hand-managed Secret.

This is why the chart ships no native `ExternalSecret` resource: the reference seam
already integrates with every secret backend, without coupling the chart to one
operator's CRDs.

On the cloud, **IAM/IRSA** — leaving the credentials Secret unset (above) — avoids
a stored secret entirely and is preferred where available.

A minimal External Secrets example whose `target.name` matches the referenced
Secret and whose keys are the ones JaaS expects:

```yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: jaas-s3
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault
    kind: SecretStore
  target:
    name: jaas-s3-credentials # = operator.storage.s3.credentialsSecret.name
  data:
    - secretKey: AWS_ACCESS_KEY_ID
      remoteRef:
        key: jaas/s3
        property: access_key_id
    - secretKey: AWS_SECRET_ACCESS_KEY
      remoteRef:
        key: jaas/s3
        property: secret_access_key
```

For the full chart values, see the [Helm values](/installation/helm-values/)
reference.

## Leader election

Leader election is on by default in operator mode (`--leader-election`, honored
only when `--enable-flux-integration` is set). The lease lets exactly one replica
reconcile at a time. On `SIGTERM` during a rolling update the lease is released
immediately rather than waiting out the 15-second lease duration, so the next
replica picks up reconciliation within seconds.

Set `--leader-election=false` only when running a single replica with no rollout
overlap.

## Multi-replica HA

High availability is the S3 backend plus leader election: every replica reads from
the same bucket, and only the lease-holder writes. No ReadWriteMany storage class
is required. During a rolling update the lease hands over on `SIGTERM`, so the
write path moves to the new leader without a manual step.

## Revision retention and rollback

`spec.history` on a `JsonnetSnippet` (default `1`, maximum `50`) keeps the last N
rendered revisions in storage. Downstream consumers can pin to an older `sha256`
for rollback or blue-green cutover, instead of always tracking the newest render.

Two flags shape how superseded revisions age out:

- `--artifact-gc-grace` (default `5m`) retains a revision for a short window after
  it leaves the keep-set. This closes the pin→fetch race in which a Flux consumer
  reads `status.artifact` a moment before the operator garbage-collects the
  revision that consumer pinned. Set it to `0` to disable the grace and restore
  eager pruning. Snippet teardown (the deletion path) is unaffected by this flag.
- `--max-artifact-bytes` (default `0`, disabled) caps the rendered artifact size in
  bytes. A snippet whose render exceeds the cap fails with `ReasonArtifactTooLarge`
  rather than publishing an oversized tarball.

## Orphan-tmp sweep

This is a local-backend concern only. On the filesystem backend a `Put` that
dies after writing the temporary file but before the atomic rename leaves a
`<rev>.tar.gz.tmp` residue, and a background sweep removes it. The S3 backend has
no such residue — `PutObject` is atomic — so its sweep is a no-op:

- `--storage-sweep-interval` (default `10m`) — how often the sweep runs. `0`
  disables it.
- `--storage-sweep-max-tmp-age` (default `30m`) — the minimum age before an
  orphaned `.tmp` file is eligible, set wider than the longest plausible in-flight
  `Put` so the sweep never races a live writer.

For production sizing of these knobs, see the
[production guide](/installation/production/). The full flag list with defaults is
on the [configuration page](/installation/configuration/).


---

# Tenancy and RBAC

Source: https://jaas.projects.metio.wtf/usage/tenancy-and-rbac/


In [operator mode](/usage/operator-mode/) the JaaS operator never acts with its
own broad privileges when touching tenant resources. Every reconcile of a
`JsonnetSnippet` runs against the RBAC of a tenant ServiceAccount, so a snippet
can only reach what its own ServiceAccount is allowed to reach.

## Per-snippet impersonation

Each `JsonnetSnippet` carries a `spec.serviceAccountName`. On every reconcile the
operator mints a short-lived Bearer token for that ServiceAccount through the
Kubernetes TokenRequest API (`serviceaccounts/token: create`) and performs all
tenant-side API calls — reading `JsonnetLibrary` objects, fetching Flux source
artifacts, and writing the published `ExternalArtifact` — as that ServiceAccount.
The operator does not use the `impersonate` verb; it uses a real token, so the
apiserver evaluates the tenant's own RBAC.

When a snippet omits `spec.serviceAccountName`, the operator falls back to the
ServiceAccount named in `--default-service-account`. If that flag is also empty,
such a snippet is rejected at reconcile time rather than silently running with
elevated rights. Set `--default-service-account` to a low-privilege account if
you want snippets without an explicit ServiceAccount to reconcile at all.

## The operator's own ClusterRole

Because every tenant-side call is the tenant's, the operator's own ClusterRole
stays minimal:

- `serviceaccounts/token: create` — to mint the Bearer tokens above.
- `get`/`list`/`watch` on `customresourcedefinitions.apiextensions.k8s.io` — the
  CRD watcher subscribes to the cluster's CRD stream so that Flux source-kind
  watches engage automatically when a previously-absent CRD becomes established,
  without a process restart.
- Watch verbs on the JaaS CRDs (`JsonnetSnippet`, `JsonnetLibrary`) and on the
  Flux source kinds it chains from (`GitRepository`, `OCIRepository`, `Bucket`,
  `ExternalArtifact`).

The operator does not need `create`/`update`/`patch` on `ExternalArtifact` in its
own ClusterRole — that write is done as the tenant, so the verb lives on the
tenant Role below.

## The tenant Role

The ServiceAccount each snippet runs as needs explicit verbs, or the first
reconcile fails with `Forbidden` and the failure points at the wrong cause. Grant
this Role in the tenant's namespace:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: <tenant-namespace>
  name: jaas-tenant
rules:
  # Required: the operator writes the snippet's ExternalArtifact as
  # the tenant ServiceAccount. Without these the publish step is denied.
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [externalartifacts]
    verbs: [get, create, update, patch]
  # Required only when the snippet uses spec.libraries (JsonnetLibrary refs).
  - apiGroups: [jaas.metio.wtf]
    resources: [jsonnetlibraries]
    verbs: [get, list]
  # Required only when the snippet uses spec.sourceRef. Grant only
  # the source kinds your tenants actually reference.
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [gitrepositories, ocirepositories, buckets, externalartifacts]
    verbs: [get]
```

Notes on each rule:

- The `externalartifacts` write verbs (`create`, `update`, `patch`) are
  mandatory. The operator writes the published artifact CR through the
  impersonating client on purpose, so one tenant Role governs both source-side
  reads and artifact-side writes.
- The `jsonnetlibraries` rule is needed only when a snippet references libraries
  through `spec.libraries`. See [snippet sources](/usage/snippet-sources/) for how
  libraries reach a snippet.
- The source-kind `get` rule is needed only when a snippet has a `spec.sourceRef`.
  Grant only the kinds your tenants reference. The `externalartifacts` entry here
  covers chained snippets — snippet B reading the `ExternalArtifact` snippet A
  publishes.

## Binding per namespace

For namespace-scoped multitenancy, bind the Role to each tenant ServiceAccount in
its own namespace with a `RoleBinding`:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: <tenant-namespace>
  name: jaas-tenant
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: jaas-tenant
subjects:
  - kind: ServiceAccount
    name: <tenant-service-account>
    namespace: <tenant-namespace>
```

Each tenant namespace gets its own `Role` + `RoleBinding`, so a snippet's blast
radius is its own namespace's grants.

## Single-tenant clusters

When a cluster runs only your own workloads and snippets do not need isolating from
each other, a Role per namespace is more than you need. The operator still
impersonates a ServiceAccount — it never applies with its own identity — so the
simplest setup is one shared account:

1. Create a single ServiceAccount and grant it the rights your snippets need. On a
   single-tenant cluster that can be broad: a `ClusterRoleBinding` to the built-in
   `cluster-admin` ClusterRole lets any snippet read any source and publish into
   any namespace.

   ```yaml
   apiVersion: v1
   kind: ServiceAccount
   metadata:
     name: jaas-snippets
     namespace: jaas-system
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: jaas-snippets-admin
   roleRef:
     apiGroup: rbac.authorization.k8s.io
     kind: ClusterRole
     name: cluster-admin
   subjects:
     - kind: ServiceAccount
       name: jaas-snippets
       namespace: jaas-system
   ```

2. Point the operator's `--default-service-account` at it (set through the chart's
   operator values), and leave `spec.serviceAccountName` off your snippets. Every
   snippet then reconciles as that one account.

This trades isolation for simplicity — every snippet has the same rights, so use it
only where you trust every snippet author. To move to multitenancy later, give
individual snippets their own `spec.serviceAccountName` scoped to a tenant Role as
above; anything still relying on the default keeps working.

## Restricting cross-namespace references

`--no-cross-namespace-refs` defaults to `true`: a `JsonnetSnippet` or
`JsonnetLibrary` whose `sourceRef` targets a different namespace is rejected. Keep
this on for multitenancy — it stops one tenant from pointing a snippet at another
tenant's source. Set it to `false` only when you operate every namespace yourself
and deliberately want cross-namespace chaining.

## Narrowing the watch

Two flags scope which CRs the operator reconciles:

- `--label-selector` narrows the watch to CRs whose labels match the selector.
  Empty (the default) selects every CR in the watched scope. Use it to run an
  operator over only a labelled subset of snippets.
- `--watch-namespaces` (or the `JAAS_WATCH_NAMESPACES` environment variable) takes
  a comma-separated namespace list and restricts the manager's cache to those
  namespaces. Empty (the default) is cluster-wide. The Helm chart's
  `operator.watchNamespaces` mirrors this: when set, it threads the value into the
  deployment's `--watch-namespaces` argument and pivots the rendered RBAC to one
  `RoleBinding` per listed namespace instead of a cluster-wide
  `ClusterRoleBinding`. Cluster-scoped resources (CRDs, the optional
  `ValidatingWebhookConfiguration`) stay bound through a `ClusterRoleBinding`,
  since they are inherently cluster-scoped.

The full flag list, with defaults, is on the
[configuration page](/installation/configuration/).


---

# Tracing

Source: https://jaas.projects.metio.wtf/usage/tracing/


The JaaS operator exports OpenTelemetry traces over OTLP gRPC. With an endpoint
configured, each reconcile and the work it fans out into — source fetch, library
resolution, evaluation, publish — becomes a span you can follow in a tracing
backend. When no endpoint is set, the OpenTelemetry SDK runs in no-op mode and
emits nothing, so tracing carries no cost until you opt in.

## The binary

Three flags configure the exporter:

- `--tracing-endpoint` — the OTLP gRPC collector `host:port`, e.g.
  `otel-collector.observability.svc:4317`. Empty (the default) disables tracing
  entirely.
- `--tracing-insecure` — skip TLS when dialing the collector. Default `false`.
  Use only for in-cluster collectors that do not terminate TLS themselves.
- `--tracing-sample-ratio` — TraceID-ratio sampling between `0.0` and `1.0`.
  Default `1.0` samples every trace.

The full flag list with defaults is on the
[configuration page](/installation/configuration/).

### Viewing spans

Point `--tracing-endpoint` at any OTLP-gRPC-speaking collector — the
OpenTelemetry Collector, Jaeger, Tempo, or a vendor agent — and view the spans
in whatever backend that collector feeds. A reconcile span carries the snippet's
namespace and name plus the spec generation it acted on
(`jaas.generation`), so you can search for one snippet and see the latency
breakdown across its fetch, eval, and publish phases. That is the fastest way to
tell a slow upstream source fetch apart from a slow evaluation when a snippet's
reconcile latency climbs.

When a phase fails — source resolution, library resolution, evaluation, or
publish — its span records the error and is marked with an error status, so the
failed span shows up as `status=error` and is directly queryable in the tracing
backend. Searching for error spans surfaces the exact phase a failing reconcile
broke in without reading through the full trace.

On a busy operator, drop `--tracing-sample-ratio` below `1.0` to keep only a
fraction of traces — `0.1` samples one in ten. Leave it at `1.0` while
diagnosing a specific problem so no trace is dropped.

## The Helm chart

Tracing lives under `operator.tracing`. It only takes effect in operator mode
(`operator.enabled: true`):

```yaml
operator:
  enabled: true
  tracing:
    endpoint: otel-collector.observability.svc:4317
    insecure: true
    sampleRatio: 1.0
```

The keys map directly onto the flags: `endpoint` → `--tracing-endpoint`,
`insecure` → `--tracing-insecure`, `sampleRatio` → `--tracing-sample-ratio`.
Leaving `endpoint` empty (the default) keeps the SDK in no-op mode.


---

# API reference

Source: https://jaas.projects.metio.wtf/api/


The wire reference for the two custom resources in the `jaas.metio.wtf/v1` group
and the Flux `ExternalArtifact` JaaS publishes. For task-oriented guidance, start
from [Usage](/usage/).


---

# ExternalArtifact output contract

Source: https://jaas.projects.metio.wtf/api/externalartifact/


For every successfully evaluated `JsonnetSnippet`, the JaaS operator upserts a
Flux `ExternalArtifact` CR (`source.toolkit.fluxcd.io/v1`) in the same namespace
as the snippet. JaaS does not own the `ExternalArtifact` CRD — it is defined and
installed by Flux's source-controller. The full CRD schema is in the
[Flux ExternalArtifact reference](https://fluxcd.io/flux/components/source/externalartifacts/);
below is the subset JaaS writes and the invariants downstream consumers can rely on.

For task-oriented context, see [Operator mode](/usage/operator-mode/).

## What JaaS writes

### `spec.sourceRef`

JaaS stamps a back-pointer to the originating snippet on `spec.sourceRef`:

```yaml
spec:
  sourceRef:
    apiVersion: jaas.metio.wtf/v1
    kind: JsonnetSnippet
    name: <snippet-name>
```

The namespace is always the snippet's own namespace — JaaS never publishes an
`ExternalArtifact` to a different namespace. The three fields
(`apiVersion`, `kind`, `name`) are wire-stable: downstream consumers that do
producer-aware reverse lookup (such as stageset-controller's RFC-0012 resolution)
match on this triple. Renaming any field is a breaking change.

### `status.artifact`

After a successful publish, JaaS writes the following fields under `status.artifact`:

| Field | Type | Description |
|---|---|---|
| `url` | string | HTTP URL of the published tarball. Revision-addressed: `<storage-base-url>/<namespace>/<name>/<sha256-hex>.tar.gz`. Byte-stable for the lifetime of the revision in the keep-set — re-publishing a different revision does not mutate the bytes at this URL. |
| `path` | string | Storage-backend-relative path of the tarball. |
| `revision` | string | `sha256:<hex>` content hash of the artifact. In `rendered` output mode this is the sha256 of the evaluated JSON; in `source` mode it is a deterministic hash over all source files (sorted by filename). |
| `digest` | string | `sha256:<hex>` of the tarball bytes (the `.tar.gz` itself, not the content). Used by Flux consumers to verify integrity after download. |
| `size` | int64 | Tarball size in bytes. |
| `lastUpdateTime` | string | RFC3339 timestamp of the most recent successful publish. |

### `status.conditions`

JaaS writes a single `Ready` condition on every successful publish:

```yaml
status:
  conditions:
    - type: Ready
      status: "True"
      reason: Succeeded
      message: artifact published
      lastTransitionTime: <RFC3339>
      observedGeneration: <generation>
```

`lastTransitionTime` is preserved across steady-state republishes (same
`Ready=True`, new revision) so the timestamp does not churn. It advances only
when the condition transitions (e.g. from `False` to `True` after a failure
clears).

## What downstream consumers rely on

**Gate on `Ready=True` before fetching.** Every Flux consumer — including
`kustomize-controller`, `helm-controller`, and JaaS's own chained-snippet
`sourceRef` resolver — treats an `ExternalArtifact` as not-yet-consumable until
`status.conditions[Ready].status == "True"`. A snippet that has not yet
completed its first successful reconcile will have no `Ready` condition (or
`Ready=False`) and leaves chained snippets blocked with reason `SourceNotReady`.

**URL is revision-addressed and byte-stable.** The URL published in
`status.artifact.url` has the form
`<storage-base-url>/<namespace>/<name>/<sha256-hex>.tar.gz`. The bytes at that
URL are immutable for as long as the revision is in the snippet's keep-set
(`spec.history`). Consumers can safely re-fetch a pinned URL (e.g. during
rollback) and verify it against the recorded `digest`. Once a revision leaves
the keep-set it is garbage-collected after the operator's GC grace period; a
fetch after that point returns 404.

**Revision identifies content, not time.** Two publishes that produce identical
content (same evaluated JSON or same source files) yield the same `revision`.
Consumers that cache by revision can skip a re-fetch when the revision has not
changed.

**The snippet mirrors `status.artifactURL`.** To avoid a second lookup, the
originating `JsonnetSnippet` also carries the URL in its own
`status.artifactURL`. `kubectl describe jsonnetsnippet` therefore surfaces the
artifact location directly.

## Tarball contents

The tarball layout depends on `spec.output` of the originating `JsonnetSnippet`:

| `spec.output` | Tarball contents |
|---|---|
| `rendered` (default) | A single `rendered.json` holding the evaluated JSON output. |
| `source` | Every source file from the resolved snippet source (inline `spec.files` or the files extracted from the `spec.sourceRef` tarball), with their original relative paths. |

All tarballs are produced deterministically: entries are sorted by path and
`ModTime` is zeroed. Two publishes from the same input produce byte-identical
`.tar.gz` files and therefore the same `revision` and `digest`.


---

# JsonnetLibrary

Source: https://jaas.projects.metio.wtf/api/jsonnetlibrary/


`JsonnetLibrary` (`jlib`) is a namespaced bundle of `.libsonnet` files that
`JsonnetSnippet` CRs in the same namespace can import. The library carries no
artifact of its own and has no controller reconciling it today — it exists purely
as a supply-side source for snippets. The import alias is set on the snippet side
via `LibraryRef.importPath` (defaulting to the library's `metadata.name`); the
library itself carries no registration name. Task-oriented guidance lives in
[Jsonnet libraries](/usage/jsonnet-libraries/).

## Example

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetLibrary
metadata:
  name: mylib
  namespace: default
spec:
  files:
    main.libsonnet: |
      {
        dashboard(env, cluster):: {
          title: '%s / %s' % [env, cluster],
        },
      }
```

Exactly one of `spec.files` or `spec.sourceRef` must be set. Admission rejects
CRs that set neither or both.

## Spec fields

`JsonnetLibrarySpec` embeds `SnippetSource` directly (the same source shape used
by `JsonnetSnippetSpec`).

| Field | Type | Default | Description |
|---|---|---|---|
| `files` | map[string]string | — | Inline map of filename to Jsonnet/libsonnet source. Exactly one of `files` or `sourceRef` must be set. |
| `sourceRef.apiVersion` | string | `source.toolkit.fluxcd.io/v1` | APIVersion of the referenced Flux source CR. |
| `sourceRef.kind` | string | — | Kind of the referenced source. One of: `GitRepository`, `OCIRepository`, `Bucket`, `ExternalArtifact`. Required when `sourceRef` is set. |
| `sourceRef.name` | string | — | Name of the referenced source CR. Required when `sourceRef` is set. Minimum length 1. |
| `sourceRef.namespace` | string | library's namespace | Namespace of the referenced source CR. Cross-namespace references are rejected when the operator is started with `--no-cross-namespace-refs`. |
| `sourceRef.path` | string | — (artifact root) | Subdirectory within the fetched tarball to treat as the library root. Empty means the archive root — required for jb-vendored trees (e.g. a `sourceRef` pointing at a Flux `OCIRepository` for a JOI image) where the library aliases resolve against the full vendor tree. |

## Status

`JsonnetLibrary` shares the `SyncStatus` type with `JsonnetSnippet`, though no
controller currently populates it. The `status` subresource exists so a future
library reconciler can be added without a schema change.

| Field | Type | Description |
|---|---|---|
| `observedGeneration` | int64 | `.metadata.generation` last reconciled. Not currently populated. |
| `conditions` | []Condition | Standard apimachinery conditions. Not currently populated. |
| `revision` | string | Not currently populated. |
| `artifactURL` | string | Not currently populated. |
| `lastSyncTime` | Time | Not currently populated. |
| `history` | []RevisionEntry | Not currently populated. |

For how snippets reference libraries, see [/api/jsonnetsnippet/](/api/jsonnetsnippet/)
(`spec.libraries`) and [Jsonnet libraries](/usage/jsonnet-libraries/).


---

# JsonnetSnippet

Source: https://jaas.projects.metio.wtf/api/jsonnetsnippet/


`JsonnetSnippet` (`jsnip`) is the published unit of Jsonnet evaluation. The JaaS
operator watches these namespaced CRs, evaluates the Jsonnet they describe, and
upserts a Flux `ExternalArtifact` whose `status.artifact.url` points at the
rendered result. Task-oriented guidance lives in
[Operator mode](/usage/operator-mode/) and [Snippet sources](/usage/snippet-sources/).

## Example

```yaml
apiVersion: jaas.metio.wtf/v1
kind: JsonnetSnippet
metadata:
  name: hello-world
  namespace: default
spec:
  serviceAccountName: hello-world-tenant
  entryFile: main.jsonnet
  output: rendered
  history: 3
  interval: 10m
  suspend: false
  files:
    main.jsonnet: |
      local lib = import 'mylib/main.libsonnet';
      lib.dashboard(std.extVar('env'), std.extVar('cluster'))
  libraries:
    - kind: JsonnetLibrary
      name: mylib
      importPath: mylib
  externalVariables:
    env: production
    cluster: eu-west-1
  tlas:
    title:
      - My Dashboard
```

Exactly one of `spec.files` or `spec.sourceRef` must be set. Admission rejects
CRs that set neither or both.

## Spec fields

| Field | Type | Default | Description |
|---|---|---|---|
| `serviceAccountName` | string | — | ServiceAccount the operator impersonates for every Kubernetes API call made on behalf of this snippet (source fetches, ExternalArtifact upserts). Must exist in the snippet's namespace. When empty, the operator's `--default-service-account` applies. Reconciliation is denied when neither is set (`ReasonServiceAccountMissing`). |
| `entryFile` | string | `main.jsonnet` | File (relative to the resolved source root) that go-jsonnet evaluates. Restricted to `[A-Za-z0-9._/-]+` with no `..` segments. Maximum 255 characters. |
| `files` | map[string]string | — | Inline map of filename to Jsonnet source. Exactly one of `files` or `sourceRef` must be set. |
| `sourceRef.apiVersion` | string | `source.toolkit.fluxcd.io/v1` | APIVersion of the referenced Flux source CR. |
| `sourceRef.kind` | string | — | Kind of the referenced source. One of: `GitRepository`, `OCIRepository`, `Bucket`, `ExternalArtifact`. Required when `sourceRef` is set. |
| `sourceRef.name` | string | — | Name of the referenced source CR. Required when `sourceRef` is set. Minimum length 1. |
| `sourceRef.namespace` | string | snippet's namespace | Namespace of the referenced source CR. Cross-namespace references are rejected when the operator is started with `--no-cross-namespace-refs`. |
| `sourceRef.path` | string | — (artifact root) | Subdirectory within the fetched tarball to treat as the source root. Empty means the archive root. |
| `libraries` | []LibraryRef | — | `JsonnetLibrary` CRs importable from this snippet. Libraries not listed here are invisible to the snippet even when present in the cluster. See [Jsonnet libraries](/usage/jsonnet-libraries/). |
| `libraries[*].apiVersion` | string | `jaas.metio.wtf/v1` | APIVersion of the library CR. |
| `libraries[*].kind` | string | — | Kind of the library CR. Currently only `JsonnetLibrary` is accepted. Required. |
| `libraries[*].name` | string | — | Name of the referenced `JsonnetLibrary` CR. Required. Minimum length 1. |
| `libraries[*].namespace` | string | snippet's namespace | Namespace of the referenced library CR. Cross-namespace references are rejected when `--no-cross-namespace-refs` is set. |
| `libraries[*].importPath` | string | library's `metadata.name` | Alias used in `import` statements inside the snippet's Jsonnet source. Collisions with OCI-mounted shared library aliases are rejected at admission. |
| `tlas` | `map[string][]string` | — | Top-level arguments passed to the snippet's outermost function. A single-element value becomes a string TLA; multiple values are passed as a JSON-encoded array, matching the HTTP query-parameter convention. |
| `externalVariables` | map[string]string | — | Seeds `std.extVar` lookups for this snippet's evaluation. Keys that conflict with the operator's `--ext-var` set are rejected at admission; if admission is bypassed, the reconciler refuses the conflicting key with `ReasonExternalVariableConflict`. |
| `output` | string | `rendered` | What bytes the published ExternalArtifact carries. `rendered`: the evaluated JSON (a single `rendered.json` in the tarball). `source`: the raw `.jsonnet`/`.libsonnet` files, for downstream consumers that re-evaluate themselves. |
| `suspend` | bool | `false` | When `true`, the operator skips the evaluation pipeline, leaves the existing ExternalArtifact in place, and reports `Ready=False` with reason `Suspended`. Setting back to `false` resumes reconciliation. Mirrors Flux's `spec.suspend` convention. |
| `history` | int32 | `1` | Number of past revisions retained in storage. Minimum 1, maximum 50. Setting to N > 1 lets downstream consumers pin to an older revision via its sha256 for rollback or blue-green flows. The keep-set is tracked in `status.history`. |
| `interval` | Duration | — (watch-only) | Period between successful reconciles regardless of watch events. Picks up state outside the watched graph (environment drift, OCI library refreshes, etc.). Bounded at admission to between `30s` and `24h`. Failed reconciles use controller-runtime's exponential backoff; `interval` governs only the steady-state cadence. |

## Status

`status` follows the `SyncStatus` shape shared by all JaaS CRs.

| Field | Type | Description |
|---|---|---|
| `observedGeneration` | int64 | `.metadata.generation` of the spec the controller last reconciled. Lets clients distinguish stale status from up-to-date. |
| `conditions` | []Condition | Standard apimachinery conditions. The `Ready` condition summarises whether the most recent reconcile succeeded; `reason` and `message` carry per-stage failure detail. See Ready condition reasons below. |
| `revision` | string | `sha256:<hex>` content hash of the last successfully reconciled source. Empty until the first successful reconcile. |
| `artifactURL` | string | HTTP URL of the last successfully published artifact tarball. Preserved across subsequent failures so the last-known-good URL stays observable. Empty until the first successful publish. |
| `lastSyncTime` | Time | Timestamp of the most recent successful reconcile. |
| `history` | []RevisionEntry | Most-recent N revisions retained in storage (`N` = `spec.history`). Index 0 is the most recent (matches `revision`). Each entry carries `revision` (sha256:hex) and `time` (publish time). |

### Ready condition reasons

Every reason string is wire-stable — runbooks key off these values.

| Reason | Status | Description |
|---|---|---|
| `Synced` | True | Most recent reconcile completed end-to-end and produced a publishable artifact. |
| `Pending` | False | Snippet observed but not yet reconciled (transient). |
| `Suspended` | False | `spec.suspend` is `true`; evaluation is paused. |
| `InvalidSpec` | False | Spec-level validation failure (missing `main.jsonnet`, invalid source combination, etc.). |
| `LibraryNotFound` | False | A `spec.libraries` entry references a `JsonnetLibrary` CR that cannot be found. |
| `CrossNamespaceRefRejected` | False | `--no-cross-namespace-refs` is enabled and a library or source reference is outside the snippet's namespace. |
| `ExternalVariableConflict` | False | `spec.externalVariables` names a key already owned by the operator's `--ext-var` set. |
| `ServiceAccountMissing` | False | Neither `spec.serviceAccountName` nor `--default-service-account` is set. |
| `EvaluationFailed` | False | go-jsonnet returned a diagnostic (syntax error, runtime error, etc.). |
| `EvaluationTimeout` | False | The eval deadline fired before the snippet finished. |
| `SourceNotReady` | False | The referenced Flux source CR exists but is not yet `Ready` or has no `status.artifact`. |
| `SourceFetchFailed` | False | Fetching or verifying the source artifact failed (HTTP error, digest mismatch, tar corruption). |
| `SourceRefNotYetSupported` | False | `spec.sourceRef` is set but the operator is running without `--enable-flux-integration`. Start the operator with that flag, or remove `spec.sourceRef` from the snippet. |
| `DependencyCycle` | False | The snippet's dependency chain (via `spec.sourceRef` or `spec.libraries`) transitively points back at itself. |
| `ArtifactTooLarge` | False | Rendered content exceeds the operator's `--max-artifact-bytes` limit. |
| `RBACDenied` | False | An apiserver call failed with Forbidden, or the source CR's kind is not registered. Non-transient — backoff is disabled. The message names the verb and resource the cluster operator must grant. |

A runbook page for each reason lives at `/runbooks/<reason-lowercased>/` on this site. See [Operator mode](/usage/operator-mode/) for lifecycle details and [ExternalArtifact output contract](/api/externalartifact/) for the artifact contract.


---

# Comparisons

Source: https://jaas.projects.metio.wtf/comparisons/


Where JaaS fits next to other tools that evaluate Jsonnet or render Kubernetes
objects. These pages compare the rendering approaches; for how the rendered output
is deployed, see the consuming controller's own documentation — for ordered, gated
rollouts that is [stageset-controller](https://stageset.projects.metio.wtf/).


---

# JaaS and grafana-operator

Source: https://jaas.projects.metio.wtf/comparisons/grafana-operator/


JaaS and the [grafana-operator](https://grafana.github.io/grafana-operator/) are
not alternatives — they do different jobs and are commonly used together. JaaS
*produces* dashboard JSON from Jsonnet; grafana-operator *consumes* dashboard
JSON and reconciles it into a Grafana instance.

## Division of labour

grafana-operator manages Grafana itself. It reconciles `GrafanaDashboard`,
`GrafanaDatasource`, `GrafanaFolder`, and related resources into one or more
Grafana instances, handling authentication, folder placement, datasource wiring,
and drift correction inside Grafana. A `GrafanaDashboard` can take its dashboard
model from inline JSON, a URL, a ConfigMap, a `grafana.com` dashboard ID, or a
remote source.

JaaS evaluates Jsonnet — including [grafonnet](https://grafana.github.io/grafonnet/)
— and publishes the rendered dashboard JSON as a Flux `ExternalArtifact`. It
knows nothing about Grafana: it renders JSON and stops there.

So the two compose along a clean seam. You author dashboards in grafonnet, JaaS
renders them to JSON, and grafana-operator takes that JSON and reconciles it into
Grafana. Each tool owns one half of the pipeline and neither reaches into the
other's domain.

## When grafana-operator alone is enough

If your dashboards are already plain JSON, or you consume them by `grafana.com`
dashboard ID, or you maintain them in the Grafana UI and export the model, then
grafana-operator covers the whole workflow on its own. There is no Jsonnet to
render, so there is nothing for JaaS to do. Reach for grafana-operator by itself
whenever the dashboard model exists as static JSON.

## When to add JaaS

Add JaaS when your dashboards are authored in grafonnet (or any Jsonnet),
typically to share panels, variables, and layout helpers across many dashboards
instead of duplicating JSON. JaaS turns that Jsonnet into the JSON
grafana-operator expects, with the same `jsonnet -J vendor` import resolution
you use locally, so a dashboard renders identically on your workstation and
in-cluster. grafana-operator then reconciles the rendered output as it would any
other dashboard JSON.

## Wiring them together

The grafana-operator project documents the JaaS integration directly, including
the `GrafanaDashboard` configuration that points at a JaaS-rendered artifact:
[grafana-operator dashboard example with JaaS](https://grafana.github.io/grafana-operator/docs/examples/dashboard/jaas/readme/).
Keep all `GrafanaDashboard`, datasource, and folder configuration on the
grafana-operator side; JaaS contributes only the rendering step and the
`ExternalArtifact` it publishes.

The [Grafana dashboards](/tutorials/grafana-dashboards/) tutorial shows the JaaS
side — authoring a grafonnet dashboard as a `JsonnetSnippet` and publishing the
rendered JSON.


---

# JaaS vs jsonnet-controller

Source: https://jaas.projects.metio.wtf/comparisons/jsonnet-controller/


[pelotech/jsonnet-controller](https://github.com/pelotech/jsonnet-controller) is
a Flux-style controller that builds Jsonnet inside the controller and applies the
result to the cluster, with a configuration model compatible with
[kubecfg](https://github.com/kubecfg/kubecfg). JaaS and jsonnet-controller both
turn Jsonnet into Kubernetes objects under Flux, but they draw the boundary
between rendering and deployment in different places.

## Coupled build-and-apply vs a rendering service

jsonnet-controller couples the two halves: one controller reads a Jsonnet
source, evaluates it, and applies the resulting objects to the cluster. The
rendered output lives inside the controller's reconcile loop; the unit of
configuration is "build this Jsonnet and apply it here."

JaaS separates them. The JaaS operator renders a `JsonnetSnippet` and publishes
the result as a content-addressed Flux `ExternalArtifact` — a tarball any
source-controller-speaking consumer can fetch. JaaS does not apply anything. A
separate consumer (a Flux `Kustomization`, a `HelmRelease`, or a
[stageset-controller](https://stageset.projects.metio.wtf/) `StageSet`) reads the
artifact and applies it. The rendered bytes are a first-class, addressable object
that more than one consumer can reference, pin to a revision, or roll back to.

That same rendering is also reachable over HTTP, so callers that are not Flux
consumers — a CI step, another service — can request a render from the same
engine that produces the in-cluster artifacts. See
[operator mode](/usage/operator-mode/) for how a snippet becomes an artifact.

## Where jsonnet-controller fits

jsonnet-controller is the more direct choice when its model matches your needs:

- **kubecfg compatibility.** If you already organise Jsonnet the kubecfg way —
  its import conventions, its top-level structure — jsonnet-controller consumes
  that directly without restructuring.
- **One object per build-and-apply.** When you want a single Flux-style
  resource that both renders a source and applies it, with no intermediate
  artifact to manage, jsonnet-controller keeps the pipeline to one moving part.

JaaS is the better fit when you want the rendered output to be an addressable,
revisioned artifact that several consumers can share, when you want the same
renderer available over HTTP to non-Flux callers, or when you want rendering and
deployment owned by separate, independently-evolving controllers.

## The deployment-side comparison

The comparison above is from the rendering angle — rendering as a service that
produces an artifact, versus build-and-apply in one controller. For the
deployment-side comparison against jsonnet-controller — ordered and gated apply,
health gating between stages, rollback — see stageset-controller's own page:
[stageset-controller vs jsonnet-controller](https://stageset.projects.metio.wtf/comparisons/jsonnet-controller/).


---

# JaaS vs Tanka

Source: https://jaas.projects.metio.wtf/comparisons/tanka/


[Grafana Tanka](https://tanka.dev) and JaaS both render Kubernetes manifests
from Jsonnet, and both build on the same [`jsonnet-bundler`](https://github.com/jsonnet-bundler/jsonnet-bundler)
vendoring conventions. The difference is *where the rendering runs and how the
result reaches the cluster*.

## The two models

Tanka renders and applies from a developer workstation or a CI runner. You
organise your code as environments (`environments/<env>/main.jsonnet` plus a
`spec.json`), run `tk show` or `tk export` to inspect the rendered objects, and
`tk apply` to push them to the cluster the environment names in its `apiServer`
field. The workstation or CI runner needs the Jsonnet toolchain, the vendored
library tree, and credentials for the target cluster.

JaaS renders in-cluster. A `JsonnetSnippet` names the same Jsonnet entry file;
the JaaS operator evaluates it continuously and publishes the result as a Flux
`ExternalArtifact`. A Flux `Kustomization` (or `HelmRelease`, or for ordered
rollouts a [stageset-controller](https://stageset.projects.metio.wtf/)
`StageSet`) consumes that artifact and applies it through the cluster's own
GitOps pull loop. No workstation or CI runner holds cluster credentials, and
developers do not need the Jsonnet toolchain or the vendor tree to ship a
change.

## When Tanka is the better fit

Tanka stays the stronger choice when its model matches the work:

- **Ad-hoc and exploratory renders.** `tk show` / `tk diff` give an immediate,
  local preview of exactly what would be applied, with no operator, no
  `ExternalArtifact`, and no consumer to configure.
- **Environments-as-code as the organising abstraction.** Tanka's
  environment/`spec.json` model — `namespace`, `injectLabels`, `apiServer`
  per environment, `tk env list` to enumerate them — is a first-class feature
  with no direct JaaS equivalent; JaaS pushes namespace and label concerns down
  to the consuming Flux `Kustomization` instead.
- **Direct, imperative apply.** When a human running `tk apply` against a
  named cluster is the intended workflow — small teams, bootstrap steps,
  break-glass operations — a pull loop adds machinery you may not need.

JaaS becomes the better fit when you want a pull-based GitOps loop, continuous
reconciliation and drift correction, server-side rendering so laptops and CI
hold no cluster credentials, per-tenant RBAC isolation on each render, and —
through stageset-controller — ordered, gated progressive delivery.

## Your imports resolve identically

JaaS's importer implements the **same resolution as `jsonnet -J vendor`**, the
scheme Tanka uses. A bare `import 'foo/main.libsonnet'` finds the library by
alias; an absolute `import 'github.com/.../gen/...'` resolves against the
vendored tree; sibling and `../` relative imports resolve from the importing
file. A `jb`-vendored tree (k8s-libsonnet, grafonnet, and the like) renders the
same bytes through JaaS as it does under `tk show`. Migration is mostly about
*where the files live*, not *rewriting Jsonnet*. See
[Jsonnet libraries](/usage/jsonnet-libraries/) for how libraries reach a
snippet.

One behaviour to plan for: Tanka walks the evaluated object and extracts every
nested `{apiVersion, kind, …}` into a resource stream. JaaS publishes exactly
what the entry file evaluates to. Make the entry file emit a flat manifest
stream — wrap resources in a `v1` `List`, or apply `std.objectValues(...)` over
your Tanka object — so the consuming Flux `Kustomization` applies every
resource.

## A migration path

| Tanka | JaaS / Flux |
|---|---|
| `vendor/` (jb-installed libs) | `JsonnetLibrary` with a `sourceRef`, or OCI-mounted libraries |
| `lib/` (project-shared libs) | `JsonnetLibrary` with inline `files` in the same namespace |
| `environments/<env>/main.jsonnet` | one `JsonnetSnippet` (`spec.entryFile` + `spec.files`/`spec.sourceRef`) |
| `import` resolution (`-J vendor`) | identical — JaaS's in-memory importer |
| per-env `spec.json` / conditionals | `spec.externalVariables` (`std.extVar`) and `spec.tlas` (top-level args) |
| `spec.json` `namespace` | Flux `Kustomization.spec.targetNamespace` |
| `spec.json` `injectLabels` | Flux `Kustomization.spec.commonMetadata.labels` |
| `tk show` / `tk export` | the JaaS operator, continuously → `ExternalArtifact` |
| `tk apply` | Flux kustomize-/helm-controller (pull) |
| `tk diff` | Flux drift detection; stageset verification between stages |
| `tk env list` | `kubectl get jsonnetsnippets -A` |

The conversion in three moves:

1. **Move the libraries.** Shared, versioned libraries (k8s-libsonnet,
   grafonnet) become a `JsonnetLibrary` backed by an `OCIRepository`, or a
   static OCI-mounted library on the operator. A project-local `lib/` becomes a
   `JsonnetLibrary` with inline `files`. The import alias is preserved either
   way.

2. **Turn each environment into a `JsonnetSnippet`.** `environments/team-a/main.jsonnet`
   becomes one snippet. Prefer `spec.sourceRef` (a Flux
   `GitRepository`/`OCIRepository`/`Bucket`) over inline `files` for real
   repositories, so Flux versions the source and JaaS re-renders on every
   commit. Per-environment differences move to `spec.externalVariables` and
   `spec.tlas`, so two environments sharing one library become two snippets that
   differ only in those fields.

3. **Replace apply with GitOps.** `tk apply` goes away. A Flux `Kustomization`
   points its `sourceRef` at the snippet's `ExternalArtifact`; Flux applies it
   and reconciles it continuously. The
   [Deploying manifests](/tutorials/deploying-manifests/) tutorial walks this
   end to end.

## What changes

There is no single `tk diff` preview — use Flux's drift detection and
stageset's between-stage verification instead. Namespace and label injection moves from
`spec.json` to the consuming Flux `Kustomization`, where both are first-class
fields. Your Jsonnet and libraries come across unchanged.


---

# JaaS vs the jsonnet CLI

Source: https://jaas.projects.metio.wtf/comparisons/jsonnet-cli/


The [`jsonnet`](https://jsonnet.org/) command-line tool — usually paired with
[`jsonnet-bundler`](https://github.com/jsonnet-bundler/jsonnet-bundler) (`jb`)
for vendoring libraries — evaluates Jsonnet to JSON on your machine. JaaS runs
the **same go-jsonnet core** as a service. This is not a question of which
implementation is correct; it is a question of *where the evaluation runs and
what surrounds it*.

## What the service adds

Over a local binary invocation, JaaS adds:

- **An HTTP endpoint other systems can call.** `GET /jsonnet/<snippet>` returns
  the evaluated JSON, with Top Level Arguments supplied as query parameters and
  external variables configured on the service. Anything that speaks HTTP can
  request a render without installing the toolchain or the vendor tree. See the
  [rendering endpoint](/usage/rendering-endpoint/) usage page.
- **An operator that turns a snippet into a revisioned Flux artifact.** With
  `--enable-flux-integration`, a `JsonnetSnippet` is evaluated continuously and
  published as a content-addressed `ExternalArtifact` that Flux consumers apply
  in-cluster — re-rendered automatically when its source changes. See
  [operator mode](/usage/operator-mode/).
- **Import resolution that matches `jsonnet -J vendor`.** JaaS resolves imports
  with the same semantics as the CLI's JPATH/vendor search, so the JSON a
  snippet produces under the service matches what the CLI produces locally.
- **Evaluation caps.** `--evaluation-timeout` bounds wall-clock time per render,
  `--max-stack` bounds call-stack depth, and `--max-concurrent-evals` bounds how
  many evaluations run at once — so one expensive snippet cannot exhaust a
  shared server. The CLI imposes none of these on its own.
- **Read-scope sandboxing.** Snippet-name resolution goes through Go's
  `os.Root`, which rejects `..` traversal and symlinks that escape the
  configured snippet directory, so a crafted request cannot read arbitrary
  files. The [evaluation and security](/usage/evaluation-and-security/) page
  details the caps and the boundaries.

## When the plain CLI is the right tool

The CLI is the better choice for work that is local and one-off:

- **One-off local renders** — inspecting what a snippet produces, debugging a
  library, iterating on a dashboard before committing it.
- **CI scripts** — a build step that renders Jsonnet to JSON and hands it to
  another tool, where standing up a service would add a moving part for no gain.
- **Anywhere a service is unwanted** — no HTTP endpoint to call, no cluster, no
  artifact to consume.

Because JaaS runs the same go-jsonnet core, these are not mutually exclusive:
you can keep `jsonnet` and `jb` on your workstation and in CI, and run JaaS
in-cluster for the server-side and GitOps paths, with both producing the same
JSON for the same input. The [local rendering](/tutorials/local-rendering/)
tutorial shows JaaS used purely as a renderer, which keeps the local and
in-cluster output aligned.


---

# Runbooks

Source: https://jaas.projects.metio.wtf/runbooks/


One page per Ready-condition `Reason` the operator sets, plus cross-cutting
incident guides. Each page covers the symptom, the cause, how to diagnose it, and
how to remediate.

The operator automatically appends a link to the matching page on each actionable
Ready-condition message — `(runbook: https://jaas.projects.metio.wtf/runbooks/<reason>/)` —
so `kubectl describe` points straight at the remediation page. Healthy reasons
(`Synced`, `Suspended`, `Pending`) get no link.


---

# ArtifactTooLarge

Source: https://jaas.projects.metio.wtf/runbooks/artifacttoolarge/


## Symptom

`READY=False`, `REASON=ArtifactTooLarge`. The Message states the rendered byte count and the configured cap.

## Cause

The snippet's rendered output exceeds the operator's `--max-artifact-bytes` (Helm: `operator.storage.maxArtifactBytes`). The cap is a defense-in-depth control — one runaway snippet shouldn't fill a shared storage volume.

Common triggers:

- a snippet generating massive arrays via `std.range(n)` with a much larger `n` than intended
- accidentally inlining a large data fixture via `importstr`
- forgetting to project / filter when fanning out per-tenant configs

## Diagnosis

Check the rendered size locally:

```shell
jsonnet /tmp/snippet/<entry-file> | wc -c
```

## Remediation

Two paths:

1. **Shrink the output.** Inspect the snippet for unintended fan-out; project only the fields downstream consumers actually need.
2. **Raise the cap.** `--max-artifact-bytes=10485760` (10 MiB) gives more headroom. Pair with PVC sizing in the chart so the volume can hold N rev's worth of the new max.

If many snippets are bumping against the cap, the cap itself may be too low for the workload — review the cluster-wide ratio of total storage to per-snippet rev count.


---

# CRD watch engagement failing

Source: https://jaas.projects.metio.wtf/runbooks/crd-watch-engagement/


Fires when `jaas_crd_watch_engagement_failures_total{gvk=...}` has increased above the per-hour threshold for the alert window. JaaS lazy-watches Flux source CRDs: at boot, only the CRDs already installed get a watch; when a previously-missing CRD becomes `Established=True` (operator installed source-controller post hoc, say), the `crdWatcher` engages a runtime watch on it via `Controller.Watch`. **When that engagement fails, the apiextensions informer fires no further events on a stable CRD** — meaning the watch stays un-engaged forever until either the CRD object's metadata/status is changed by something else, or the operator restarts.

The visible symptom is that snippets with `spec.sourceRef.Kind=<the affected kind>` stop re-rendering on upstream source updates. There is no per-snippet status signal — they sit at their last-rendered revision, drifting from upstream.

## Symptom

- `JaaSCRDWatchEngagementFailing` alert is firing with `gvk` labelling the affected kind.
- `kubectl describe jsonnetsnippet` on snippets referencing that GVK shows a Ready condition that hasn't moved in hours/days.
- Upstream Flux source CRs (GitRepository, OCIRepository, Bucket, ExternalArtifact) show recent `status.artifact.revision` changes that the jaas snippets aren't picking up.

## Diagnosis

### Step 1 — confirm the CRD is actually installed and Established

```shell
kubectl get crd <plural>.source.toolkit.fluxcd.io \
  --output jsonpath='{.status.conditions[?(@.type=="Established")].status}{"\n"}'
```

Expect `True`. If the CRD is not installed or not yet Established, the watcher is correct to skip; install / wait.

### Step 2 — check the operator's RBAC on the source kind

```shell
kubectl auth can-i list <plural>.source.toolkit.fluxcd.io \
  --as=system:serviceaccount:<ns>:<operator-sa>
kubectl auth can-i watch <plural>.source.toolkit.fluxcd.io \
  --as=system:serviceaccount:<ns>:<operator-sa>
```

If either is "no", the chart's `operator-tenants` ClusterRole (or per-namespace RoleBinding when `watchNamespaces` is set) is missing the `get/list/watch` verbs on this kind. Update the chart's `FluxSourceKinds` mapping or add the verb manually.

### Step 3 — check controller-runtime cache state

```shell
kubectl --namespace <ns> logs <operator-pod> | grep -E 'engage|Failed to watch|cache' | tail -20
```

Look for `cache reconnect`, `informer failed`, or `Watch failed: forbidden`. A transient cache reconnect during a heavy load period can trip engagement once; the DD7 bounded-retry mechanism re-engages automatically. Sustained failures point at RBAC or a misconfigured `MetricsBindAddress`.

## Remediation

1. **Fix the verb / kind / RBAC** issue identified above.
2. **Roll the operator pod** to force a fresh `SetupWithManager` pass, which re-detects every Flux CRD and re-engages watches that succeed on first try:

   ```shell
   kubectl --namespace <ns> rollout restart deployment <operator-deployment>
   ```

3. **Verify** the counter stops increasing and the alert clears.

## When the alert is noisy

If `jaas_crd_watch_engagement_failures_total` ticks once at boot but never again, that's the expected DD7 bounded-retry behavior: the first attempt failed (transient race during cache start), the retry succeeded. Raise `crdWatchEngagementFailuresPerHour` if the boot-time blip is noisy enough to page.


---

# CrossNamespaceRefRejected

Source: https://jaas.projects.metio.wtf/runbooks/crossnamespacerefrejected/


## Symptom

`READY=False`, `REASON=CrossNamespaceRefRejected`. The Message names the offending reference (a library or a sourceRef).

## Cause

The operator is running with `--no-cross-namespace-refs=true` (the chart default) and the snippet references a library or Flux source in a different namespace.

This is a deliberate isolation control — it mirrors Flux's `--no-cross-namespace-refs` and stops a tenant in namespace A from reaching libraries / sources in namespace B without an explicit relationship.

## Diagnosis

Inspect the spec and identify which reference points outside the snippet's namespace:

```shell
kubectl --namespace <ns> get jsonnetsnippet <name> --output yaml | grep -E "namespace:|sourceRef:|libraries:"
```

## Remediation

Three options, by isolation strength:

1. **(recommended)** Duplicate the library / source CR into the snippet's namespace.
2. Promote the library to an OCI volume — mount via the chart's `additionalLibraries` map. Becomes part of the operator's filesystem, available to every snippet without a cross-namespace CR ref.
3. **(loosen isolation, cluster-wide)** Set `--no-cross-namespace-refs=false` on the operator. Affects every tenant in the cluster — only do this when tenants are mutually trusting.


---

# DependencyCycle

Source: https://jaas.projects.metio.wtf/runbooks/dependencycycle/


## Symptom

`READY=False`, `REASON=DependencyCycle`. The Message names the snippet that closes the cycle.

## Cause

The snippet's `spec.sourceRef` chain transitively points back at the snippet itself. The reconciler detects this and refuses to publish so chained snippets don't loop forever (each republish would trigger every downstream snippet to re-render, which would re-trigger the upstream, and so on).

Two cycle shapes:

1. **Direct sourceRef cycle:** `A.spec.sourceRef → ExternalArtifact/A`. A snippet sourcing from its own published artifact.
2. **Library-mediated cycle:** `A.spec.libraries → JsonnetLibrary/L`, where `L.spec.sourceRef → ExternalArtifact/A` (or a longer chain back to A).

The validating webhook (`--enable-webhook`) rejects new CRs that introduce a cycle at admission; the reconciler check is a fallback for when admission is bypassed or the cycle is introduced retroactively (e.g., adding a new library that closes a loop with existing snippets).

## Diagnosis

Walk the chain manually:

```shell
kubectl get jsonnetsnippet <name> --output jsonpath='{.spec.sourceRef}' && echo
# Then inspect what that sourceRef points at, and what it sources from in turn.
```

For library-mediated cycles, the chain is:

```text
snippet A.spec.libraries[i] → JsonnetLibrary L → L.spec.sourceRef → ExternalArtifact X → snippet that publishes X
```

If the publishing snippet at the end is A, you have a cycle.

## Remediation

Break the cycle by removing the back-edge. Common fixes:

- detach a library from its sourceRef (inline its files instead, if small)
- have the upstream snippet publish a smaller artifact the downstream doesn't need to re-consume
- restructure so the shared data lives in a static ConfigMap referenced as a sourceRef-equivalent, not in a snippet output


---

# Eval-concurrency saturation

Source: https://jaas.projects.metio.wtf/runbooks/eval-saturation/


Not tied to a single `Reason` — this page covers what to do when the global concurrent-eval cap (`--max-concurrent-evals`) is full and JaaS is shedding new evaluations. The cap exists because the synchronous go-jsonnet API has no context-aware cancellation: once an eval starts it runs to natural completion, so an unbounded queue lets a runaway snippet pile up goroutines that outlive every caller's deadline.

## Symptom

One or more of:

- `JaaSEvalSaturation` is firing — `jaas_eval_in_flight / jaas_eval_max_concurrent` has been above the threshold (default `0.9`) for the alert window.
- `JaaSEvalRejected` is firing — `rate(jaas_eval_unavailable_total[5m])` has been above the threshold.
- HTTP clients see `503 Service Unavailable` with body `{"error": "evaluation_unavailable", "message": "concurrent-eval cap is full; retry after backoff"}`.
- `kubectl describe jsonnetsnippet` shows recurring `Warning EvalUnavailable` events with message `reconcile deferred for 1s by --max-concurrent-evals`. Ready condition stays untouched (backpressure is not failure).
- `jaas_eval_outstanding_timed_out` is also elevated — confirms the runaway-snippet diagnosis: orphaned evals are pinning slots while their parents have already given up.

## Diagnosis: why is the cap full?

The cap fills for two distinct reasons. The right remediation depends on which.

### Path A — runaway snippet (goroutines outliving their ctx)

Read the leak gauge. If it's non-zero and trending up, evaluations are starting but not finishing — almost always a single snippet whose work dwarfs `--evaluation-timeout`.

```shell
# Live count of evals whose parent reconcile already timed out:
kubectl --namespace <jaas-ns> exec deploy/jaas -- \
  wget -qO- http://localhost:8083/metrics | grep jaas_eval_outstanding_timed_out
```

To find the culprit, scan recent reconcile logs for `Jsonnet evaluation timed out` followed by repeated `EvalUnavailable` warnings on the same snippet:

```shell
kubectl --namespace <jaas-ns> logs deploy/jaas --since=15m \
  | grep -E 'EvaluationTimeout|EvalUnavailable' \
  | sort | uniq -c | sort -rn | head
```

The snippet whose name dominates that list is the culprit. Common causes:

- Deep recursion that takes seconds-to-minutes to complete naturally even after the parent deadline fires.
- Pathological library import that triggers go-jsonnet's worst-case eval order.
- A `std.foldl` over a generated array of millions of entries.

### Path B — genuine load above the cap

Leak gauge is at zero (or steady, not growing), `jaas_eval_in_flight` is pegged near the cap, and many distinct snippets show `EvalUnavailable` events. The cap is sized too small for the workload.

```shell
# Distribution of which snippets are seeing rejections — a flat
# distribution across many snippets is path B; a single dominant
# snippet is path A.
kubectl --namespace <jaas-ns> exec deploy/jaas -- \
  wget -qO- http://localhost:8083/metrics \
  | grep jaas_snippet_eval_unavailable_total
```

## Remediation

### Path A — runaway snippet

1. **Suspend the offender** to stop new evals while you fix the snippet:

   ```shell
   kubectl --namespace <ns> patch jsonnetsnippet <name> --type merge \
     --patch '{"spec":{"suspend":true}}'
   ```

2. **Inspect the snippet** to understand the cost. Lower `--max-stack` is a blunt clamp that rejects pathological recursion before it can leak. The chart's `operator.maxStack` defaults to 500; pull it down to ~200 if the snippet doesn't legitimately need deeper recursion.

3. **Tighten `--evaluation-timeout`** if the snippet's natural completion time is the load-bearing factor. A 5s default lets a 60s pathological eval leak for nearly a minute; dropping to 1s shrinks the worst-case leak window.

4. **Re-enable** after the snippet spec is fixed:

   ```shell
   kubectl --namespace <ns> patch jsonnetsnippet <name> --type merge \
     --patch '{"spec":{"suspend":false}}'
   ```

### Path B — genuine load

1. **Raise the cap** if the operator has CPU headroom. The default is `max(GOMAXPROCS*4, 16)`; double it via the chart:

   ```shell
   helm upgrade <release> <chart> --reuse-values \
     --set arguments.maxConcurrentEvals=64
   ```

   Each in-flight eval pins roughly one CPU when actively running, so the practical ceiling is bounded by node CPU. Past 2-3× GOMAXPROCS the gains drop sharply — more contention, same throughput.

2. **Tune the per-snippet rate limiter** if a small number of snippets dominate the request rate. `--rerender-rate` + `--rerender-burst` cap each snippet's reconcile frequency independent of the global eval cap.

3. **Scale horizontally** if a single replica can't keep up even at the raised cap. The chart's `replicas.max` controls the HPA ceiling; combined with the storage layer's leader election (S3 backend) you get multi-replica HA where every replica reads but only the lease-holder writes.

## When NOT to raise the cap

If the leak gauge is non-zero AND growing, raising the cap lets more goroutines pile up before the next saturation event. Diagnose path A first. The cap is a backpressure boundary, not a throughput knob.

## Disable the gate (not recommended)

`--max-concurrent-evals=0` disables the gate entirely. The leak gauge keeps working, but rejections never fire — a single runaway snippet can OOM the pod. Use only if you've sized the workload precisely and want to surface saturation purely via the leak gauge.


---

# EvaluationFailed

Source: https://jaas.projects.metio.wtf/runbooks/evaluationfailed/


## Symptom

`READY=False`, `REASON=EvaluationFailed`. The Message contains the raw go-jsonnet diagnostic — file name, line, column, and the underlying error.

## Cause

The snippet failed to evaluate. Three broad categories:

- **Syntax error** — unclosed brace, missing comma, bad indent.
- **Runtime error** — `std.extVar('missing')` for an unset variable, division by zero, `error '...'` thrown explicitly.
- **Import error** — `import 'missing.libsonnet'` resolves to nothing in the snippet's file map or library imports.

## Diagnosis

Read the Message — it names the file and line. Reproduce locally:

```shell
# Pull the snippet's files into a tempdir, then evaluate.
kubectl get jsonnetsnippet <name> --output json | jq -r '.spec.files["main.jsonnet"]' > /tmp/main.jsonnet
jsonnet /tmp/main.jsonnet
```

For sourceRef-based snippets, fetch the tarball:

```shell
SOURCE_URL=$(kubectl get gitrepository <name> --output jsonpath='{.status.artifact.url}')
curl -sL "$SOURCE_URL" | tar -xz -C /tmp/snippet
jsonnet /tmp/snippet/<entry-file>
```

## Remediation

Fix the snippet (or its libraries / source) and re-apply.

The diagnostic message can leak the on-disk path of the snippet — fine in-cluster, worth gating behind a flag if exposed to untrusted callers in the future.


---

# EvaluationTimeout

Source: https://jaas.projects.metio.wtf/runbooks/evaluationtimeout/


## Symptom

`READY=False`, `REASON=EvaluationTimeout`. The snippet's eval ran longer than the operator's `--evaluation-timeout`.

## Cause

Snippets are evaluated synchronously per reconcile. The deadline is wall-clock, not CPU — but go-jsonnet has no mid-evaluation cancellation, so a snippet that runs over the deadline still keeps consuming CPU on the operator pod until it returns naturally.

Common triggers:

- a snippet recursing deeper than necessary (try lowering `--max-stack` to surface this as a stack-limit error instead, then optimize)
- a snippet that loads a huge sourceRef tarball and walks it
- a snippet that calls `std.set` / `std.uniq` over a very large array

## Diagnosis

Time it locally:

```shell
time jsonnet /tmp/snippet/<entry-file>
```

If it takes seconds locally, the operator's bound is too tight. If it takes minutes locally, the snippet itself is the problem.

## Remediation

Two paths:

1. **Optimize the snippet.** Memoize repeated work into `local` bindings, narrow the input set, avoid `std.flattenDeepArrays` over deep trees.
2. **Raise the operator's bound.** `--evaluation-timeout=30s` (default `5s`) gives more headroom. Pair with `resources.cpu` headroom in the chart so the slow snippet doesn't starve other reconciles.

For pathological inputs, consider splitting the snippet — render the slow part less often via a separate snippet others source from (see `examples/operator/chained-snippets.yaml`).


---

# ExternalVariableConflict

Source: https://jaas.projects.metio.wtf/runbooks/externalvariableconflict/


## Symptom

`READY=False`, `REASON=ExternalVariableConflict`. The Message names the conflicting key.

## Cause

The snippet's `spec.externalVariables` declares a key that the operator already owns via `--ext-var` (cluster operator-level). Operator keys win by design — they're how the cluster admin pins cluster-scoped values like `cluster`, `region`, `environment` so a tenant snippet can't override them.

## Diagnosis

```shell
# Which keys does the operator own?
kubectl --namespace <jaas-ns> get pod --selector app.kubernetes.io/name=jaas --output yaml | grep -A1 "\--ext-var="
```

Cross-reference with the snippet's `spec.externalVariables`.

## Remediation

Rename the conflicting key in the snippet, or remove it from the snippet entirely (the operator-level value flows through automatically).

If the snippet legitimately needs a different value, that's a structural problem — the snippet shouldn't ship with an opinion that overrides a cluster-wide invariant. Re-discuss with the cluster admin.

The validating webhook (`--enable-webhook`) catches this at admission so `kubectl apply` rejects it before it lands. The reconciler enforcing the same rule is a fallback for when admission is bypassed.


---

# High reconcile latency

Source: https://jaas.projects.metio.wtf/runbooks/reconcile-latency/


Linked from the `JaaSReconcileLatencyHigh` alert. Fires when the controller-runtime `controller_runtime_reconcile_time_seconds` histogram p99 exceeds the configured threshold (default 30s) for the alert window.

## Symptom

```text
ALERTS{alertname="JaaSReconcileLatencyHigh", controller="jsonnetsnippet"}
```

- `kubectl get jsonnetsnippet` shows status updates trickling in well after spec changes.
- Operator pod CPU is moderate-to-high but the queue is draining (distinguishes this from [workqueue-saturation.md](workqueue-saturation.md), where the queue itself is growing).

## Cause

Reconcile latency is the wall-clock cost of one `Reconcile()` call. Inside the call, JaaS does (in order):

1. `Get` the snippet from the cache.
2. Run the dependency-cycle BFS (one Get per touched node).
3. Resolve the source (inline files, or sourceRef → Fetcher: source-CR Get + tarball HTTP fetch + tar extract).
4. Resolve libraries (one Get per `LibraryRef`).
5. Evaluate the snippet via go-jsonnet.
6. Publish via the storage backend (`Put`).
7. Status update + ExternalArtifact upsert.

Slow reconciles are almost always one of:

- **Slow `Fetcher`** — a large tarball over a slow network, or a misbehaving source-controller (digest mismatch retries).
- **Heavy jsonnet evaluation** — a snippet that imports lots of large libraries or runs unbounded recursion below the stack limit.
- **Slow `Publisher`** — S3 throttling, a slow PVC, or large rendered output (close to `--max-artifact-bytes`).
- **Cycle-detection blowup** — a dense graph of snippets cross-referencing via `sourceRef`. The BFS is O(V+E) but each visit is a `Get`.

## Diagnosis

```shell
# Where is the time going? OTel spans break Reconcile into sub-stages.
# Requires --tracing-endpoint set on the operator.
kubectl --namespace <jaas-ns> get deploy jaas \
  --output jsonpath='{.spec.template.spec.containers[0].args}' \
  | tr ',' '\n' | grep tracing

# Without tracing: the histograms expose enough to triangulate.
kubectl --namespace <jaas-ns> port-forward svc/jaas-metrics 8083:8083 &
curl -s localhost:8083/metrics | grep -E 'reconcile_time|rendered_bytes'
```

The `jaas_snippet_rendered_bytes` histogram tells you whether a slow Publisher is the cause (large outputs) vs. a slow Fetcher (small outputs but the histogram is dominated by upstream IO).

For a single suspect snippet, force a reconcile under load and observe:

```shell
kubectl annotate jsonnetsnippet <ns>/<name> \
  jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite
kubectl --namespace <jaas-ns> logs deploy/jaas --tail=50 | grep <name>
```

## Remediation

- **Slow Fetcher.** Narrow `spec.sourceRef.path` to the subdirectory the snippet actually needs. Tarballs balloon when an entire monorepo is published; the filter trims what JaaS has to download.
- **Heavy eval.** Cap `--max-stack` to bound runaway recursion. Profile the snippet locally via `jsonnet` (the CLI) — the operator's evaluation is identical.
- **Slow Publisher.** See [storage-recovery.md](storage-recovery.md) for backend-specific tuning.
- **Cycle-detection blowup.** Reorganize snippets so the cross-reference graph is shallow; cycle detection visits every reachable node, so a fan-out of N snippets multiplies the cost.
- **OTel for forensics.** Enable `--tracing-endpoint` and the per-stage spans turn this from guessing into measurement. The chart values key is `operator.tracing.endpoint`.

## Prevention

- Pair the alert with `JaaSSnippetArtifactGrowing` ([artifacttoolarge.md](artifacttoolarge.md)). A snippet whose rendered bytes are climbing is almost always headed for a latency spike too.
- For multi-replica HA, leader election keeps only one replica in the reconcile loop — sustained latency on the lease-holder is what matters; standby latency is not measured.


---

# InvalidSpec

Source: https://jaas.projects.metio.wtf/runbooks/invalidspec/


## Symptom

`READY=False`, `REASON=InvalidSpec`. The condition Message names which field is at fault.

## Cause

Spec-level validation that admission should have caught but the reconciler is enforcing as a fallback:

- `spec.entryFile` is empty
- both `spec.files` and `spec.sourceRef` are set (mutually exclusive)
- neither `spec.files` nor `spec.sourceRef` is set
- `spec.entryFile` does not match any key in the resolved file map

## Diagnosis

```shell
kubectl describe jsonnetsnippet <name>
```

Read the Message — it names the field.

## Remediation

Fix the spec and reapply. If the validating webhook is enabled (`--enable-webhook`), `kubectl apply` rejects the invalid spec at admission instead of letting it land and fail later.

If you're seeing `InvalidSpec` on apply through the webhook, that's a bug — file an issue with the rejected manifest.


---

# LibraryNotFound

Source: https://jaas.projects.metio.wtf/runbooks/librarynotfound/


## Symptom

`READY=False`, `REASON=LibraryNotFound`. The Message names the missing library.

## Cause

A `spec.libraries[*]` entry references a `JsonnetLibrary` CR that the operator cannot Get. Common reasons:

- the library CR doesn't exist (typo, wrong namespace, not yet applied)
- the tenant ServiceAccount doesn't have `get` on the library kind in the library's namespace
- the library is in a different namespace and `--no-cross-namespace-refs=true`

## Diagnosis

```shell
# Confirm the library exists.
kubectl --namespace <ns> get jsonnetlibrary <name>

# Test the tenant's RBAC.
kubectl auth can-i get jsonnetlibrary <name> \
  --as=system:serviceaccount:<ns>:<tenant-sa> -n <library-ns>
```

If `can-i` returns `no`, RBAC is the gap.

## Remediation

- create the library CR if it doesn't exist
- grant `get` on `jsonnetlibraries.jaas.metio.wtf` to the tenant SA via a Role + RoleBinding
- if cross-namespace is intended, either move the library into the snippet's namespace or set `--no-cross-namespace-refs=false` cluster-wide


---

# Operator pod not ready

Source: https://jaas.projects.metio.wtf/runbooks/operator-pod-down/


Linked from the `JaaSOperatorPodDown` alert. Fires when at least one jaas pod has been `Ready=False` for the alert window (default 5m).

## Symptom

```text
ALERTS{alertname="JaaSOperatorPodDown", namespace="<jaas-ns>"}
```

- New JsonnetSnippets stay `Ready=Unknown` indefinitely (no controller reconciling them).
- Existing snippets keep serving stale `ExternalArtifact` content (the storage HTTP server may still respond; reconciliation is the part that stopped).

## Cause

One of the chart's two probes is failing:

- **Liveness** (`/live`) is unconditional 200 — a failure here means the pod's HTTP server itself isn't responding (deadlock, OOM, panic).
- **Readiness** (`/ready`) consults `HealthState`, which goes `false` during `drainBeforeShutdown` or before the listeners bind.
- **Startup** (`/start`) returns 503 until `MarkStarted()` is called — bind failures (port already in use, permission denied) keep the pod stuck here forever.

Frequent causes:

1. **Bind failure** on one of the HTTP servers (jsonnet, management, storage, metrics, webhook). The pod logs a clear "listen tcp: address already in use" or similar at boot.
2. **OOMKilled** — a pathological snippet allocated a huge object; the kubelet killed the pod. `kubectl describe pod` shows `Last State: Terminated, Reason: OOMKilled`.
3. **Image pull failure** — registry rate limit, wrong tag, missing pull secret.
4. **TLS cert missing or unreadable** when `operator.webhook.enabled=true` and the cert-manager Secret hasn't materialized.
5. **Lease contention** that leaves no replica as leader (every replica reconnecting to renew, never holding the lease).

## Diagnosis

```shell
# Which probes are failing? Events tell you.
kubectl --namespace <jaas-ns> describe pod --selector app.kubernetes.io/name=jaas

# Pod logs — the boot sequence prints every listener it binds.
kubectl --namespace <jaas-ns> logs --selector app.kubernetes.io/name=jaas --tail=300

# Compare against the expected listener set.
kubectl --namespace <jaas-ns> get svc --selector app.kubernetes.io/name=jaas
```

For OOM:

```shell
kubectl --namespace <jaas-ns> top pod --selector app.kubernetes.io/name=jaas
kubectl --namespace <jaas-ns> get pod --selector app.kubernetes.io/name=jaas --output yaml \
  | grep -A3 lastState
```

For lease problems (multi-replica only):

```shell
kubectl --namespace <jaas-ns> get lease <release-name>-operator --output yaml
```

`holderIdentity` flipping every renewal interval is a sign of network flake or apiserver pressure — the replicas can't keep the lease stable.

## Remediation

- **Bind failure.** Free the colliding port (often `8080`, when the controller-runtime metrics endpoint defaults conflict with the jsonnet HTTP port — confirm `--metrics-bind-address` is `:8083`).
- **OOMKilled.** Raise `resources.memory`, then identify the runaway snippet (the bench in `internal/operator/bench_test.go` is a regression baseline; the runaway is usually obvious from `jaas_snippet_rendered_bytes`).
- **Image pull.** Standard k8s drill: check secrets, registry, tag.
- **TLS cert.** With `certMode=cert-manager`, confirm the Issuer / Certificate are ready. With `certMode=self-signed`, the operator regenerates on boot — a permission error on the cert-dir mount blocks it.
- **Lease flap.** Try `kubectl --namespace <jaas-ns> delete lease <release-name>-operator` to force a fresh election. If it keeps flapping, the cluster has bigger problems than JaaS.

## Prevention

- Pin `replicas.max: 1` and `LeaderElectionReleaseOnCancel: true` (chart defaults). Multi-replica is only worth it for storage-backed HA — single-replica is the simpler operational story.
- Run the cleanup Job (`operator.cleanupOnDelete.enabled: true`, chart default) so a `helm uninstall` of a wedged operator unwinds the finalizers instead of leaving orphaned snippets.
- Pair this alert with `JaaSControllerWorkqueueDepthHigh` ([workqueue-saturation.md](workqueue-saturation.md)). A pod-down event almost always coincides with a saturated queue from snippets piling up.


---

# Pending

Source: https://jaas.projects.metio.wtf/runbooks/pending/


## Symptom

`kubectl get jsonnetsnippet` shows `READY=Unknown` (or `False` with `REASON=Pending`) immediately after the snippet was created or its spec was updated.

## Cause

The operator has observed the CR but hasn't completed its first reconcile pass yet. Transient by design.

## Diagnosis

```shell
kubectl describe jsonnetsnippet <name>
```

If the timestamp on the `Pending` condition is older than ~30 seconds, the operator is either:

- not running (`kubectl --namespace <jaas-namespace> get pods`)
- backed up on its work queue (check `kubectl logs deploy/jaas` and the `workqueue_depth` metric)
- not the leader (multi-replica install, `kubectl --namespace <jaas-namespace> get lease` shows the holder)

## Remediation

If transient, wait. If persistent:

- restart the operator: `kubectl rollout restart deploy/jaas`
- inspect the operator's logs for errors

If the snippet is stuck in `Pending` because the work queue is saturated, increase replicas (with leader election ON) or raise the rate-limiter budget (`--rerender-rate`, `--rerender-burst`).


---

# RBACDenied

Source: https://jaas.projects.metio.wtf/runbooks/rbacdenied/


## Symptom

```text
kubectl describe jsonnetsnippet <name>
...
Status:
  Conditions:
    Reason:  RBACDenied
    Status:  False
    Type:    Ready
    Message: RBAC denied reading the source CR — grant the tenant ServiceAccount get on the source kind ...
```

Or for a missing CRD:

```text
    Message: source CR's kind is not registered with the apiserver — install the corresponding CRD ...
```

The reconciler logs at warn level and stops engaging backoff for this snippet. The next reconcile happens only when the snippet's spec changes, a referenced library / source CR's status flips, or `spec.interval` ticks — so the workqueue isn't burning cycles on a permanently-failing call.

## Cause

The apiserver returned `Forbidden` on a call the reconciler had to make. Three call sites can surface this:

1. **Source-CR read.** The tenant ServiceAccount lacks `get` on the kind named by `spec.sourceRef.kind`. The fix is on the tenant's `Role` / `RoleBinding`.
2. **Library-CR read.** The tenant SA lacks `get` (and typically `list`) on `jsonnetlibraries` in the snippet's namespace.
3. **ExternalArtifact write.** The tenant SA lacks `create` / `update` / `patch` on `externalartifacts`. This is the publish step — the rendered bytes are computed but the operator can't write them back as the impersonating client.

The `NoMatchError` variant means the apiserver doesn't know about the resource kind at all — typically because the corresponding CRD (usually Flux's source-controller) isn't installed in the cluster.

## Diagnosis

`kubectl describe` shows the operator's classified message. The verbatim apiserver error (`forbidden: ServiceAccount X cannot get resource Y in namespace Z`) is appended after the operator's classification, so you can read off:

- Which SA tried the call (`system:serviceaccount:<namespace>:<sa-name>`)
- Which verb it lacked (`cannot get`, `cannot create`, `cannot patch`)
- Which resource (`gitrepositories.source.toolkit.fluxcd.io`, `jsonnetlibraries.jaas.metio.wtf`, `externalartifacts.source.toolkit.fluxcd.io`)

Verify the SA exists and inspect its current permissions:

```shell
kubectl --namespace <tenant-namespace> get sa <sa-name>
kubectl auth can-i --as=system:serviceaccount:<tenant-namespace>:<sa-name> \
    --namespace <tenant-namespace> \
    <verb> <resource>
```

For the `NoMatchError` variant:

```shell
# Verify the CRD is actually installed:
kubectl get crd | grep -E 'source.toolkit.fluxcd.io|jaas.metio.wtf'

# If source-controller's CRDs are missing, install Flux:
# https://fluxcd.io/flux/installation/
```

## Remediation

Grant the missing verb to the tenant SA. The minimum verbs JaaS expects are documented in the [Tenancy and RBAC](https://jaas.projects.metio.wtf/usage/tenancy-and-rbac/#the-tenant-role) guide. Typical fix:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: <tenant-namespace>
  name: jaas-tenant
rules:
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [externalartifacts]
    verbs: [get, create, update, patch]
  - apiGroups: [source.toolkit.fluxcd.io]
    resources: [gitrepositories, ocirepositories, buckets, externalartifacts]
    verbs: [get]
  - apiGroups: [jaas.metio.wtf]
    resources: [jsonnetlibraries]
    verbs: [get, list]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: <tenant-namespace>
  name: jaas-tenant
subjects:
  - kind: ServiceAccount
    name: <sa-name>
    namespace: <tenant-namespace>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: jaas-tenant
```

After the RBAC change, force the next reconcile (the snippet's last spec edit doesn't auto-retrigger because the failure was non-transient):

```shell
kubectl annotate jsonnetsnippet <name> jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite
```

For the missing-CRD case, installing the CRD fires the operator's `crdWatcher`, which engages the watch automatically — no manual nudge needed.

## Why this is non-transient

`Forbidden` doesn't recover by retry. The cluster operator (or whoever owns the tenant's RBAC) has to grant the verb. Retrying every 16 minutes would pile up wasted API calls and obscure the workqueue's signal. The non-transient classification lets the workqueue depth metric remain meaningful — anything on it is genuinely live work.

`NoMatchError` is the same shape: until the CRD is installed, the kind doesn't exist. Retry can't conjure it.


---

# Self-signed webhook cert renewal failing

Source: https://jaas.projects.metio.wtf/runbooks/webhook-cert-renewal/


Fires when `jaas_webhook_cert_renewal_failures_total` has increased above the configured per-hour threshold. The `Renewer` background goroutine rotates the self-signed TLS material every `Validity / 3` (typically every few months for a year-long cert). When it can't, the existing cert keeps working until its natural expiry — at which point the apiserver stops trusting the chain and **every JsonnetSnippet admission fails cluster-wide with `x509` errors**.

## Symptom

- `JaaSWebhookCertRenewalFailing` alert is firing (severity: critical).
- Operator pod logs carry repeated `Self-signed webhook cert renewal failed` warnings at the `Renewer.Interval` cadence.
- `kubectl describe validatingwebhookconfiguration <name>` shows a `caBundle` that hasn't rotated since the failures started.
- The pod stays `Ready=True` — the renewer's failures don't gate the readiness probe.

## Diagnosis

The most common causes, in order of frequency:

### Cause A — RBAC drift on the named VWC

The operator's ClusterRole pins `resourceNames: [<VWCName>]` on the `validatingwebhookconfigurations` patch verb. A chart upgrade that changes `operator.webhook.vwcName` (or a manual chart edit) leaves the running pod patching a name it no longer has permission for.

```shell
kubectl auth can-i patch validatingwebhookconfiguration/<vwc-name> \
  --as=system:serviceaccount:<namespace>:<operator-sa>
```

If the answer is "no", the chart's `operator-cluster` ClusterRole needs the current VWC name added to `resourceNames` (or the running pod restarted to pick up the new name).

### Cause B — VWC renamed out from under the operator

A separate controller (admission policy automation, GitOps drift correction) renamed the VWC. The operator is patching a stale name.

```shell
kubectl get validatingwebhookconfigurations \
  --selector 'app.kubernetes.io/instance=<release-name>'
```

If the live name differs from the operator's `--webhook-validating-config-name` flag, redeploy the operator with the correct flag or rename the VWC back.

### Cause C — `CertDir` gone read-only

The chart mounts `CertDir` as an `emptyDir` by default. A `kubectl apply` that adds a `readOnlyRootFilesystem: true` security context or a sidecar that re-mounts the volume can break writes.

```shell
kubectl --namespace <ns> exec <operator-pod> -- ls -l /tmp/k8s-webhook-server/serving-certs/
kubectl --namespace <ns> exec <operator-pod> -- touch /tmp/k8s-webhook-server/serving-certs/.write-probe
```

If the touch fails, the security-context or volume mount needs fixing.

## Remediation

1. **Fix the root cause** (RBAC, name, or mount).
2. **Roll the operator pod** to force a fresh bootstrap of the cert and a re-patch of the VWC:

   ```shell
   kubectl --namespace <ns> rollout restart deployment <operator-deployment>
   ```

   The new pod's bootstrap path goes through the dual-CA union (DD8), so existing replicas stay trusted across the rotation.

3. **Verify renewal is healthy** after the bounce — the `jaas_webhook_cert_renewal_failures_total` counter should stop increasing, and the alert clears once the `for:` window passes.

## When to consider switching to cert-manager

If the self-signed renewer keeps tripping over your environment's RBAC story or pod-security policies, the chart supports `operator.webhook.certMode: cert-manager` — cert-manager handles the rotation and the operator mounts the resulting secret. Trade-off: requires cert-manager installed and an Issuer configured.


---

# ServiceAccountMissing

Source: https://jaas.projects.metio.wtf/runbooks/serviceaccountmissing/


## Symptom

`READY=False`, `REASON=ServiceAccountMissing`.

## Cause

The snippet omitted `spec.serviceAccountName` AND the operator was started without `--default-service-account`. The operator refuses to reconcile a snippet with no effective ServiceAccount because every reconcile mints a tenant token from that SA — without one, there's nothing to impersonate.

## Diagnosis

```shell
kubectl get jsonnetsnippet <name> --output jsonpath='{.spec.serviceAccountName}'
```

Empty? Either the snippet must set it, or the cluster operator must configure a default.

## Remediation

Pick one:

1. **Snippet-side (preferred for multi-tenant setups):** set `spec.serviceAccountName: <existing-sa>` on every snippet. Each tenant uses its own SA → least-privilege impersonation.
2. **Cluster-side (single-tenant clusters):** start the operator with `--default-service-account=<sa-name>`. Every snippet without an explicit SA impersonates this one. The default SA must exist in **every snippet's namespace** — the operator looks it up per-reconcile.


---

# SourceFetchFailed

Source: https://jaas.projects.metio.wtf/runbooks/sourcefetchfailed/


## Symptom

`READY=False`, `REASON=SourceFetchFailed`. The Message describes what went wrong (HTTP error, digest mismatch, tarball too large, etc.).

## Cause

The Fetcher resolved the source CR and started downloading the artifact, but the download itself failed. Three subcategories:

- **HTTP failure** — connection refused, 5xx from the source-controller endpoint, TLS handshake error
- **Digest mismatch** — the bytes don't hash to `status.artifact.digest`. Possible truncation or in-flight tampering
- **Tarball oversized** — extracted bytes exceed `MaxArchiveBytes` (default 64 MiB)

## Diagnosis

Check the source CR's `status.artifact.url` is reachable from the operator pod:

```shell
kubectl exec deploy/jaas -- wget -O- <status.artifact.url> | wc -c
```

A connection refused means the storage endpoint of source-controller (or another publisher) is unreachable — usually a NetworkPolicy issue.

For digest mismatches, the source CR has likely been republished mid-fetch — the next reconcile typically succeeds.

For oversized tarballs, the snippet's `spec.sourceRef.path` filter is too broad — narrow it so only the files the snippet actually `import`s come through.

## Remediation

- **Network**: fix the NetworkPolicy / DNS / TLS that's blocking the fetch
- **Digest**: re-reconcile (manual: `kubectl annotate jsonnetsnippet <name> jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite`)
- **Oversized**: narrow `spec.sourceRef.path` to the subdirectory the snippet needs, or split the source repo


---

# SourceNotReady

Source: https://jaas.projects.metio.wtf/runbooks/sourcenotready/


## Symptom

`READY=False`, `REASON=SourceNotReady`. The Message names the source CR (`GitRepository/foo`, `ExternalArtifact/bar`, etc.).

## Cause

The Flux source CR the snippet references exists but its own `status.conditions[Ready]` is not yet True (or `status.artifact` is empty). The operator refuses to fetch from a source it can't trust as ready.

For chained snippets specifically: the upstream snippet may have failed reconciliation, so its ExternalArtifact is stale or unpopulated.

## Diagnosis

```shell
kubectl describe <kind> <source-name>
# Look for the Ready condition and any error messages.
```

For Flux sources, also check the source-controller logs:

```shell
kubectl --namespace flux-system logs deploy/source-controller | grep <source-name>
```

## Remediation

Fix the upstream source. The operator watches Flux source kinds and will re-reconcile the snippet automatically when the source flips to Ready=True — no manual nudge required.


---

# SourceRefNotYetSupported

Source: https://jaas.projects.metio.wtf/runbooks/sourcerefnotyetsupported/


## Symptom

`READY=False`, `REASON=SourceRefNotYetSupported`.

## Cause

The snippet sets `spec.sourceRef` but the operator was built without a Fetcher wired in. This is a mis-deployment in practice — production binary always wires `sources.New()`. Seeing this in a real cluster means you're running:

- a test/dev binary
- a custom build where `defaultBuilder` was modified
- a future code path that hasn't enabled sourceRef yet

## Diagnosis

```shell
kubectl logs deploy/jaas | grep -i "fetcher"
```

If the operator logs no Fetcher initialization, the binary is incomplete.

## Remediation

Use a release binary, or convert the snippet to `spec.files` inline as a temporary workaround.


---

# Storage backend recovery

Source: https://jaas.projects.metio.wtf/runbooks/storage-recovery/


Not tied to a single `Reason` — this page covers what to do when the artifact store itself is degraded (PVC lost, S3 endpoint unavailable, the storage HTTP server is down). Downstream Flux consumers (kustomize-controller, helm-controller, grafana-operator) dereference `ExternalArtifact.status.artifact.url` to fetch tarballs; when that URL stops returning bytes, dependent resources stall.

## Symptom

One or more of:

- Downstream Flux consumers report `404 Not Found` or `connection refused` against the JaaS storage URL.
- `kubectl get externalartifact --all-namespaces` shows resources whose URL is unreachable from the consumer pods.
- The operator pod is healthy (Ready=True on snippets), but the storage Service is unresponsive.
- `helm upgrade` of the chart from `persistence.enabled: false` to `true` — or vice versa — caused a gap.

## Triage: which backend are you running?

```shell
kubectl --namespace <jaas-ns> get deploy jaas \
  --output jsonpath='{.spec.template.spec.containers[0].args}' \
  | tr ',' '\n' | grep -E 'storage-backend|storage-path|s3-endpoint'
```

- `--storage-backend=local` → filesystem behind `--storage-path`. Either an emptyDir (chart default) or a PVC.
- `--storage-backend=s3` → an external S3-compatible bucket; the storage HTTP server in-pod is a thin streaming proxy over `minio-go`.

## Filesystem backend

### PVC lost or replaced

Symptom: every `ExternalArtifact` URL returns 404 even though the snippet's Ready=True. The Publisher writes idempotently on every reconcile, so making the operator re-render every snippet is the fix:

```shell
# Roll the operator — the cache is rebuilt from the apiserver and every
# snippet is reconciled. Each reconcile re-runs the Publisher, which
# writes the tarball back to disk. With a clean PVC, the gap closes in
# one reconcile loop.
kubectl --namespace <jaas-ns> rollout restart deploy/jaas
```

If reconciles do not produce tarballs again, force a reconcile per snippet:

```shell
kubectl annotate --all-namespaces jsonnetsnippet --all \
  jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite
```

The window between PVC loss and the first re-render is the only outage downstream consumers see. With `replicas.max: 1` (chart default) that window is bounded by the rollout time; with multi-replica HA + RWX PVC, the lease-holder writes immediately and the gap is sub-second.

### emptyDir reset (pod restart)

`persistence.enabled: false` is fine for low-stakes deployments but every pod restart re-renders every snippet. The "fix" is to enable persistence:

```shell
helm --namespace <jaas-ns> upgrade jaas oci://ghcr.io/metio/helm-charts/jaas \
  --reuse-values \
  --set operator.storage.persistence.enabled=true \
  --set operator.storage.persistence.size=10Gi
```

After the upgrade, follow the PVC-lost steps above to repopulate the new volume.

### Storage HTTP server unreachable but operator healthy

Diagnose:

```shell
kubectl --namespace <jaas-ns> port-forward svc/jaas-storage <port>:8082 &
curl -fsSL http://localhost:<port>/<namespace>/<snippet>/<rev>.tar.gz | wc -c
```

If port-forward works but in-cluster fetches fail, look at NetworkPolicy:

```shell
kubectl get networkpolicy --all-namespaces | grep -i jaas
```

The chart's optional NetworkPolicy locks the storage port to a single source-controller selector. If your Flux install lives elsewhere or carries different labels, the NetworkPolicy will silently drop the traffic. Either widen `networkPolicy.fromSourceControllerSelector` or disable the NetworkPolicy on this chart and rely on a cluster-wide policy.

## S3 backend

### Endpoint unreachable / 5xx from the provider

The pod-side storage HTTP server is a proxy. When the upstream S3 endpoint is down, the proxy returns 502/504 and downstream Flux consumers retry with backoff. Operator pod health is unaffected.

Diagnose:

```shell
# Pull a recent tarball directly to confirm it's the upstream
kubectl --namespace <jaas-ns> exec deploy/jaas -- \
  wget -O- http://localhost:8082/<namespace>/<snippet>/<rev>.tar.gz | wc -c
```

If the in-pod fetch fails too, check the operator logs for `minio-go` errors. Auth problems (expired session token, rotated access key) show up here distinctly from network problems.

### Bucket gone or wrong prefix

If the bucket was emptied or the `--s3-prefix` changed, the proxy returns 404 even though the snippet is Ready. Re-render every snippet to repopulate:

```shell
kubectl annotate --all-namespaces jsonnetsnippet --all \
  jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite
```

The Publisher writes idempotently — running this against a working bucket is safe.

### Credentials rotated

With static credentials (`--s3-access-key` / `--s3-secret-key` / inline chart values), a rotation requires a Deployment restart for `minio-go` to pick up the new values. With IAM/IRSA, the discovery chain re-reads at request time — the operator picks the new identity up automatically.

Force the new keys to take effect:

```shell
kubectl --namespace <jaas-ns> rollout restart deploy/jaas
```

## Disk full (`ENOSPC`)

When the volume backing `--storage-path` fills up, `Store.Put` returns the kernel's `ENOSPC` verbatim → `Publisher.Publish` wraps it → no specific sentinel matches → classified as transient `ReasonSourceFetchFailed` → controller-runtime backoff retries forever at the ~16 min cap. The operator pod stays healthy; every snippet using local storage starts looping.

### Symptom

- Multiple snippets simultaneously flip to `Ready=False` with messages mentioning `no space left on device`.
- `JaaSControllerWorkqueueDepthHigh` alert fires (the backoff queue saturates).
- `kubectl --namespace <jaas-ns> exec deploy/jaas -- df -h /var/lib/jaas/artifacts` shows the volume at 100%.

### Recovery

1. **Free space.** Either resize the PVC (if `operator.storage.persistence.enabled: true`), increase `operator.storage.sizeLimit` (emptyDir), or prune retained revisions:

   ```shell
   # Lower spec.history on noisy snippets so the next reconcile prunes
   # older revisions. The Publisher's Prune step removes everything
   # outside the keep-set, freeing space proportional to the change.
   kubectl patch jsonnetsnippet <name> --type=merge --patch '{"spec":{"history":1}}'
   ```

   For an immediate flush, force-prune by removing the artifact directory of a snippet you're certain doesn't need its history:

   ```shell
   kubectl --namespace <jaas-ns> exec deploy/jaas -- \
       rm -rf /var/lib/jaas/artifacts/<namespace>/<expendable-snippet>
   ```

2. **Drive reconciliation.** The backoff cap is ~16 min, so failing snippets retry within that window automatically. To force immediate re-render:

   ```shell
   # Annotate every snippet that flipped to Ready=False:
   kubectl get jsonnetsnippets --all-namespaces \
     --output jsonpath='{range .items[?(@.status.conditions[?(@.type=="Ready")].status=="False")]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}' \
     | xargs -n1 -I {} sh -c 'ns=$(echo "{}" | cut -d/ -f1); n=$(echo "{}" | cut -d/ -f2); \
         kubectl --namespace "$ns" annotate jsonnetsnippet "$n" \
         jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite'
   ```

3. **Prevention.** Set `operator.storage.maxArtifactBytes` to cap individual snippet renders before they hit the disk. Use `JaaSSnippetArtifactGrowing` (opt-in PrometheusRule alert) to catch creep before saturation.

## OOM during render (kubelet kills the operator pod mid-publish)

A snippet that evaluates into a multi-MB JSON tree can push the operator past its `resources.memory` limit. The kubelet SIGKILLs the pod. Effects:

- The pod restarts cleanly. Probes flip back to Ready within seconds.
- The leader-election lease is dropped on the killed pod's process death; the next replica picks it up immediately (or, on single-replica installs, the new pod re-acquires).
- **Mid-write `.tmp` files** in the local store are orphaned because the `Store.Put` was interrupted between `Create` and the atomic `Rename`. The background `Sweep` (default every 10 min, configurable via `operator.storage.sweep.interval`) cleans them up once their `ModTime` falls outside the `maxTmpAge` window.
- **The snippet that triggered the OOM** keeps failing in the same way on the next reconcile because its rendered output is what blew memory in the first place. The pod loop-restarts until either the snippet is fixed or its memory cost falls below the limit.

### Symptom

- Operator pod restarts every few minutes; `kubectl describe pod` shows `Last State: Terminated, Reason: OOMKilled`.
- `JaaSOperatorPodDown` alert fires (the restart window is shorter than the recovery, so probes flap).
- One specific snippet correlates with each restart.

### Diagnose

The killing snippet is whichever the operator was reconciling when memory peaked. Easiest way to identify:

```shell
kubectl --namespace <jaas-ns> logs deploy/jaas --previous --tail=200 \
    | grep -B2 -i 'reconcil\|publish' | tail -30
```

The last `Reconcile` log line before the kill names the snippet. Confirm via `jaas_snippet_rendered_bytes` if the operator's metrics endpoint was reachable before the kill (the histogram captures bytes per Synced reconcile; a runaway snippet stands out).

### Remediate

1. **Cap the runaway snippet.** Set `operator.storage.maxArtifactBytes` cluster-wide to refuse renders past a threshold (e.g., `16777216` for 16 MiB). The Publisher fails the snippet with `ReasonArtifactTooLarge` instead of attempting the write. The operator pod stops OOM-restarting.

2. **Raise operator memory.** The chart's `resources.memory` default is conservative (`64Mi`); a cluster with large rendered artifacts may need `256Mi` or more. Update via:

   ```shell
   helm --namespace <jaas-ns> upgrade jaas oci://ghcr.io/metio/helm-charts/jaas \
     --reuse-values --set resources.memory=256Mi
   ```

3. **Wait for `Sweep`.** The orphan `.tar.gz.tmp` files clear automatically once the surrounding issue is fixed and `maxTmpAge` (default 30 min) elapses. `jaas_storage_sweep_failures_total` flags any persistent issue.

For S3 backends, OOM during a multipart upload leaves an incomplete upload at the S3 endpoint — most providers expire these automatically (AWS S3: 7-day default). No JaaS-side action needed.

## Multi-replica considerations

With leader election on (the chart default when operator mode is enabled), only the lease-holder writes to storage. A storage-incident on the lease-holder is the worst case: the standby reads but cannot fill the gap until the lease transfers. To force a handover during a storage incident on one replica:

```shell
kubectl --namespace <jaas-ns> delete lease <release-name>-operator
```

The next replica acquires the lease within `LeaseDuration` (15s default), and its Publisher writes against its own (presumably healthy) view of the backend.

## Prevention

- Use `persistence.enabled: true` in production. Default-off is for quick demos.
- Run the chart's opt-in PrometheusRule (`operator.metrics.prometheusRule.enabled: true`) — `JaaSSnippetArtifactGrowing` catches runaway tarballs before they fill the PVC.
- Set `operator.storage.maxArtifactBytes` to cap pathological snippets at admission time, not after they've written to disk.
- For S3, configure a bucket lifecycle policy that does not delete tarballs the operator still considers live. The Publisher's `Prune` only deletes revisions the snippet's `status.history` no longer references.

## `JaaSStorageSweepFailures` alert

Linked from the alert by name. The sweep is a background GC that removes orphaned `.tar.gz.tmp` residue left by Puts whose process died after the tmpfile landed but before the rename. The reconcile hot path is unaffected — Put still works — but stale `.tmp` files accumulate until the underlying issue is fixed.

**Symptom:** `jaas_storage_sweep_failures_total` increases over time; `JaaSStorageSweepFailures` alert fires after >3 failures/hour (configurable).

**Diagnose:**

```shell
# Operator logs carry the underlying sweep error:
kubectl --namespace <jaas-ns> logs deploy/jaas --tail=200 | grep "Storage sweep failed"

# For local backend: check the volume's free space + permissions.
kubectl --namespace <jaas-ns> exec deploy/jaas -- df -h /var/lib/jaas/artifacts
kubectl --namespace <jaas-ns> exec deploy/jaas -- ls -la /var/lib/jaas/artifacts

# For S3 backend: sweep is a no-op (Put is atomic, no .tmp residue),
# so this alert firing on S3 is a wiring bug. Confirm backend:
kubectl --namespace <jaas-ns> get deploy jaas --output jsonpath='{.spec.template.spec.containers[0].args}' | tr ',' '\n' | grep storage-backend
```

**Remediate:**

- Disk full → increase the PVC size or shrink `operator.storage.maxArtifactBytes`.
- Permission errors → ensure the operator's `securityContext.runAsUser` matches the PVC's filesystem ownership; reset with `chown` on a one-shot Job.
- Sustained S3 listing throttling → unlikely on the local backend; for S3 this alert shouldn't fire at all because Sweep is a no-op there.

Manual cleanup once the underlying issue is fixed:

```shell
kubectl --namespace <jaas-ns> exec deploy/jaas -- find /var/lib/jaas/artifacts -name '*.tar.gz.tmp' -mmin +30 -delete
```

## `WithdrawForced` event on snippet deletion

If a snippet stuck in `Terminating` carries a `Warning WithdrawForced` Kubernetes Event, the operator has already done what it could — the finalizer was dropped after `--max-withdraw-wait` (default 1h) of failing Withdraws against the backend, and the snippet itself is GC'd. The tarball it owned is now orphaned in storage. To clean up:

```shell
# Read the elapsed time + last backend error from the event message:
kubectl describe jsonnetsnippet <name>
# Locate the orphan in the configured backend:
#   local:  <--storage-path>/<namespace>/<name>/<rev>.tar.gz
#   s3:     <--s3-prefix>/<namespace>/<name>/<rev>.tar.gz
# Remove it once the backend is reachable again.
```

A force-drop should be rare — it means the backend was broken for the full wait window. Investigate **why** before lowering `maxWithdrawWait`: aggressive timeouts make transient apiserver/S3 incidents cause orphans you'd otherwise have recovered from naturally.

## Related runbooks

- [artifacttoolarge.md](artifacttoolarge.md) — one snippet's output exceeds the cap (different symptom: snippet Ready=False, not "URL unreachable")
- [sourcefetchfailed.md](sourcefetchfailed.md) — JaaS *consuming* an upstream artifact, not its own storage


---

# Suspended

Source: https://jaas.projects.metio.wtf/runbooks/suspended/


## Symptom

`READY=False`, `REASON=Suspended`. The snippet's `spec.suspend` is set to `true`.

## Cause

An operator (or automation) paused reconciliation for this snippet, typically to investigate a downstream issue without the artifact being rewritten underneath them. The previously-published `ExternalArtifact` and the on-disk tarball are left intact — downstream Flux consumers continue serving the last successful render.

This is a normal, intentional state. It is not a failure.

## Diagnosis

```shell
kubectl get jsonnetsnippet <name> --output jsonpath='{.spec.suspend}'
```

If the value is `true`, the suspension is set on the spec. Check `kubectl describe` for the last condition transition timestamp to see when it happened.

## Remediation

To resume reconciliation:

```shell
kubectl patch jsonnetsnippet <name> --type=merge --patch '{"spec":{"suspend":false}}'
```

Or remove the field entirely:

```shell
kubectl edit jsonnetsnippet <name>
# delete the `suspend: true` line under spec
```

The next reconcile picks up the snippet's current spec and republishes if anything has drifted.


---

# Synced

Source: https://jaas.projects.metio.wtf/runbooks/synced/


## Symptom

`kubectl get jsonnetsnippet` shows `READY=True` with `REASON=Synced`. This is the healthy state — no action required.

## Cause

The most recent reconcile pass completed end-to-end: source resolved, libraries resolved, eval succeeded, tarball published, ExternalArtifact upserted.

## Diagnosis

To inspect the published artifact:

```shell
kubectl get externalartifact <snippet-name> --output yaml
```

The `status.artifact.url` points at the operator's storage HTTP server. Curl it from a pod in the cluster to confirm the bytes match:

```shell
kubectl run --rm --stdin --tty --restart=Never tmp --image=docker.io/library/curlimages/curl -- \
  curl -sL <status.artifact.url> | tar tzv
```

## Remediation

None — this is the healthy state.


---

# Watch-layer silent failure

Source: https://jaas.projects.metio.wtf/runbooks/operator-watch-silent/


Not tied to a per-snippet `Reason`. This page covers the one RBAC-denial path the reconciler cannot surface itself: when the **operator's own** ClusterRole is missing a verb on a watched resource kind, controller-runtime's informer fails to start its watch, logs warnings, and retries silently. The reconciler never sees the failure — and no snippet's status condition will tell you about it.

If a per-snippet runbook (`rbacdenied.md`, `sourcefetchfailed.md`, `sourcenotready.md`) doesn't match the symptoms, this is where to look next.

## Symptom

- Snippets that worked yesterday stop receiving watch-driven re-renders. They still reconcile on edits or `spec.interval` ticks, but not on upstream source changes.
- `kubectl describe jsonnetsnippet` shows healthy or stale state — never `Reason=RBACDenied` or any other failure.
- Operator pod is `Ready=True`, all probes pass.
- A `Flux GitRepository` (or `OCIRepository` / `Bucket` / `ExternalArtifact` / `JsonnetLibrary`) advances its `status.artifact` but no JaaS reconcile fires.

## Why JaaS can't tell you directly

controller-runtime's informer is what watches resource kinds at the apiserver. If the operator SA lacks `list`/`watch` on a kind:

1. The informer's initial LIST returns `Forbidden`.
2. controller-runtime logs `Failed to watch *v1.X: forbidden: ...` at error level.
3. The informer retries with exponential backoff — forever.
4. The reconciler's reconcile loop never gets events from that kind.
5. The reconciler itself is unaware that the watch is non-functional. The "no events arriving" condition is indistinguishable from "no actual changes upstream."

This is the one diagnostic surface the operator can't unify with its other RBAC-denial paths (Fetcher / library Get / Publisher write — all per-reconcile and surfaced via `Reason=RBACDenied`).

## Diagnosis

The smoking gun is in the operator's logs:

```shell
kubectl --namespace <jaas-ns> logs deploy/jaas --tail=2000 \
  | grep -E 'Failed to watch|"reflector.go"' \
  | head -30
```

Expected output if the watch layer is healthy: nothing.

If broken, you'll see lines like:

```text
E0610 12:34:56.789  reflector.go:227 "Failed to watch" err="failed to list *v1.GitRepository: ... forbidden: ServiceAccount \"jaas\" cannot list resource \"gitrepositories\" in API group \"source.toolkit.fluxcd.io\" at the cluster scope" type="*v1.GitRepository"
```

The error names the SA, verb, resource, and API group — that's the exact gap to close.

Check what the operator SA can actually do:

```shell
kubectl auth can-i list gitrepositories.source.toolkit.fluxcd.io \
    --as=system:serviceaccount:<jaas-ns>:jaas
```

Compare against the chart-rendered ClusterRole:

```shell
kubectl get clusterrole <release>-operator --output yaml | grep -A2 source.toolkit.fluxcd.io
```

## Remediation

The operator's ClusterRole verbs are defined in the chart's `templates/clusterrole-operator.yaml` (in the metio/helm-charts repo, under `charts/jaas/`). Three causes warrant separate fixes:

### 1. Chart upgraded with `rbac.create: false`

Someone disabled chart-rendered RBAC (`operator.rbac.create: false`) and the external RBAC source missed a verb. Either re-enable chart-rendered RBAC, or update whatever owns the ClusterRole to grant the missing verbs.

### 2. Manual chart edit removed verbs

A `kubectl edit clusterrole <release>-operator` or a hand-rolled overlay removed verbs the chart originally rendered. Restore via `helm upgrade --install` (idempotent for an installed chart).

### 3. New source kind added but chart's drift gate didn't catch it

The drift-gate test in the chart's `tests/clusterrole-operator_test.yaml` (metio/helm-charts, under `charts/jaas/`) — the "ClusterRole drift gate" case — is supposed to fail at PR time if `operator.FluxSourceKinds` adds a kind without a matching ClusterRole entry. If you reach this runbook page because of a new kind, **it means the test passed but production RBAC is still missing it** — investigate why (test bypassed, chart drift, etc.). Add the verb manually as a hotfix, then file a bug against the drift gate.

After granting the verb, restart the operator pod so a fresh informer picks up the new ServiceAccount-token permissions:

```shell
kubectl --namespace <jaas-ns> rollout restart deploy/jaas
```

Watch-driven re-renders resume within seconds.

## Why not detect this automatically

A startup probe (operator does a test `LIST` per kind and refuses to boot on Forbidden) was considered and rejected:

- It would block startup on transient apiserver flakes during deploys.
- The CRD-watcher pattern already handles the missing-CRD case gracefully (the `crdWatcher` engages a watch dynamically when the CRD becomes `Established=True`). Layering "and also fail on Forbidden" complicates that contract.
- A misconfigured cluster should surface the issue via the existing logs + the `kubectl auth can-i` workflow, which is the standard k8s troubleshooting path.

The diagnostic trail above is the supported recovery story. If a user reports hitting this in the wild and finds the log-grep step too obscure, a follow-up is a `jaas_informer_watch_failures_total` Prometheus counter plus a `JaaSInformerWatchFailing` alert — same shape as `JaaSStorageSweepFailures` from the storage layer. Track in `open-items.md` if it comes up.

## Related runbooks

- [rbacdenied.md](rbacdenied.md) — per-reconcile RBAC denials the reconciler CAN surface (tenant SA can't read a source CR / library, can't write ExternalArtifact). If a snippet's status says `Reason=RBACDenied`, start there instead.
- [storage-recovery.md](storage-recovery.md) — different failure surface (storage backend rather than apiserver), same "graceful degradation, diagnosis via logs + metrics" shape.


---

# Workqueue saturation

Source: https://jaas.projects.metio.wtf/runbooks/workqueue-saturation/


Linked from the `JaaSControllerWorkqueueDepthHigh` alert. Fires when the reconciler's workqueue holds more items than the configured threshold (default 50) for the alert window. Not tied to a `Reason` constant — workqueue depth is a controller-runtime signal, not a per-snippet status.

## Symptom

```text
ALERTS{alertname="JaaSControllerWorkqueueDepthHigh", controller="jsonnetsnippet"}
```

- New snippet writes settle slowly (status takes minutes to flip, not seconds).
- Existing snippets re-render later than `spec.interval` would suggest.
- `kubectl describe jsonnetsnippet` shows a stale `ObservedGeneration`.

## Cause

The operator is dequeuing reconciles slower than the API server enqueues them. Common causes, in observed-frequency order:

1. **Slow Publisher backend.** S3 throttling, a slow PVC, or a stalled object-store transaction — each reconcile blocks on the storage `Put`.
2. **API server pressure.** The cluster's apiserver is slow on `GET` / `UPDATE` (often during a control-plane upgrade or under heavy general load).
3. **Per-snippet rate-limiter exhaustion.** A flapping snippet eats its token-bucket budget; the controller's exponential backoff stretches the queue.
4. **A large fan-out from a single source watch.** One Flux source republishes and 100 snippets reference it; every snippet's reconcile lands in the queue at once.
5. **Webhook latency.** When `--enable-webhook` is on, every snippet write traverses the validating webhook. A wedged webhook (cert issue, slow tenant client) holds the apiserver's call open and indirectly enlarges the queue.

## Diagnosis

```shell
# Per-controller queue depth — confirm which controller is saturated
kubectl --namespace <jaas-ns> port-forward svc/jaas-metrics 8083:8083 &
curl -s localhost:8083/metrics | grep -E 'workqueue_depth|workqueue_adds_total'

# Reconcile-time histogram — separates "lots of queued items" (fan-out)
# from "each reconcile is slow" (storage / apiserver).
curl -s localhost:8083/metrics | grep 'controller_runtime_reconcile_time_seconds'
```

Cross-reference operator logs for the slow path:

```shell
kubectl --namespace <jaas-ns> logs deploy/jaas --tail=500 \
  | grep -E 'reconcile|publisher|s3|webhook'
```

If `controller_runtime_reconcile_time_seconds` p99 is also high, the alert is the symptom — `JaaSReconcileLatencyHigh` is the more useful page; see [reconcile-latency.md](reconcile-latency.md).

## Remediation

- **Storage backend slow.** Switch from `local` (PVC) to `s3` for higher write throughput, or vice versa if S3 is throttled. See [storage-recovery.md](storage-recovery.md).
- **Apiserver slow.** Pause spec-update churn (`spec.interval` longer on hot snippets), then wait for control-plane health to return.
- **Rate-limiter exhaustion.** Increase `operator.rerenderBurst` to absorb the spike, then investigate why a snippet is flapping (typically a `Reason*` other than `Synced` keeps firing — check `kubectl get events`).
- **Fan-out from a single source.** Stagger snippet intervals so their watch events don't all settle at once. The controller serializes per-snippet; concurrency across snippets is bounded by `MaxConcurrentReconciles` (set high enough at chart default — 5 — that drag from a single fan-out is unusual).
- **Webhook latency.** `kubectl get validatingwebhookconfiguration jaas-jsonnetsnippet --output yaml` and confirm the `caBundle` is current; restart the operator if the cert was rotated externally.

## Prevention

- Run `operator.metrics.prometheusRule.enabled: true` so this alert fires *before* downstream consumers notice.
- Cap `--max-artifact-bytes` so a runaway snippet can't slow every Publisher write behind it.
- For multi-replica HA, leader election keeps only one replica reconciling — workqueue depth on the lease-holder is the only one that matters.


---

# Contributing

Source: https://jaas.projects.metio.wtf/contributing/


Build and test JaaS inside a containerized dev shell — the only host requirement
is a container runtime — and understand the CI gate and the calendar-based release
pipeline. Source, issues, and releases live at
[github.com/metio/jaas](https://github.com/metio/jaas).


---

# Building and Testing

Source: https://jaas.projects.metio.wtf/contributing/building/


The host needs no Go toolchain. All build and test commands run inside a containerized dev shell defined by `dev/Containerfile` and driven by `ilo`. The `.ilo.rc` at the repo root supplies the shell arguments, so the short form is always available:

```shell
ilo bash -c '<command>'
```

The dev shell pre-installs the Go toolchain, all static-analysis tools, `controller-gen`, and the envtest asset bundle. The Go module cache is mounted from `${XDG_CACHE_HOME}/go` so repeated builds do not re-download dependencies.

## Building

```shell
ilo bash -c 'go build -o jaas .'
```

The `Dockerfile` builds the production image. It accepts `VERSION` and `COMMIT` build args:

```shell
docker build -t ghcr.io/metio/jaas:dev .
docker build --build-arg VERSION=v1 --build-arg COMMIT=abc123 -t ghcr.io/metio/jaas:dev .
```

## Regenerating generated code

CRD manifests and the `zz_generated.deepcopy.go` file are generated by `controller-gen`. Run this after touching `api/v1/` types:

```shell
# Regenerate deep-copy functions
ilo bash -c 'controller-gen object paths=./api/v1/...'

# Regenerate CRD manifests under config/crd/bases/
ilo bash -c 'controller-gen crd paths=./api/v1/... output:crd:dir=./config/crd/bases'
```

## Static analysis

golangci-lint is not used. The standalone tools below run directly, both in CI and locally inside the dev shell:

```shell
ilo bash -c 'go vet ./...'
ilo bash -c 'staticcheck ./...'       # config: staticcheck.conf, checks = ["all"]
ilo bash -c 'gofumpt -l .'           # empty output means clean; any output is a failure
ilo bash -c 'gosec ./...'            # inline #nosec justifications silence false positives
ilo bash -c 'govulncheck ./...'      # reachable-from-code advisories only
ilo bash -c 'arch-go'                # architecture rules; config: arch-go.yml
```

## Test layers

### Pure unit tests

Table-driven tests with no external state. They live next to the code they cover across `internal/...` and `api/v1/`. Several act as drift gates: `conditions_test.go` verifies that every `Reason*` constant has a matching `docs/runbooks/<reason>.md`, and `TestErrorResponse_StableCodeValues` pins the wire-level `ErrCode*` strings against accidental rename.

```shell
ilo bash -c 'go test -count=1 -race -cover ./...'
```

To run a single test by name:

```shell
ilo bash -c 'go test -count=1 -v -run TestName ./internal/handler/'
```

### Envtest-backed operator tests

Files named `envtest_*_test.go` (in `internal/operator/`, `internal/webhook/selfsigned/`, and `main_envtest_test.go` at the repo root) boot a real `kube-apiserver` and `etcd` via controller-runtime's `envtest` package and run the reconciler, webhook, and full `run(...)` function against them.

The tests share one apiserver instance per test binary, guarded by a `sync.Once`, so the startup cost is paid once. Each test `t.Skip`s when `KUBEBUILDER_ASSETS` is unset — there is no build tag. The dev shell pre-stages the envtest asset bundle via `setup-envtest` (pinned `ENVTEST_K8S_VERSION`) and exports `KUBEBUILDER_ASSETS`, so these tests run by default inside `ilo bash`. On a host without the bundle they silently skip.

The envtest harness sets `Config.SkipImpersonation` (the only place that setting is allowed) and defaults `MetricsBindAddress` to `"0"` so parallel test cases do not fight over the metrics port.

```shell
ilo bash -c 'go test -count=1 -race -cover ./...'
```

### Golden / example end-to-end tests

`examples_test.go` boots the full binary via `runInBackground` and asserts HTTP responses against golden files under `testdata/golden/`. Comparison is semantic — both sides are parsed as JSON and compared on the parsed values, so whitespace and key ordering are irrelevant.

After changing an example or adding a new one, regenerate the golden files:

```shell
ilo bash -c 'go test -update ./...'
```

Inspect and commit the diff in `testdata/golden/`.

### Fuzz tests

Fuzz targets in `internal/handler/`, `internal/sources/`, and `internal/urlguard/` harden the request path, the tar/gzip artifact unpacker, and the SSRF URL/IP parser against adversarial input. CI exercises their seed corpus as ordinary unit tests. To fuzz interactively:

```shell
ilo bash -c 'go test -fuzz=FuzzName -fuzztime=30s ./internal/urlguard/'
```

### Benchmarks

Throughput benchmarks in `internal/eval/`, `internal/storage/`, and `internal/operator/` cover reconcile throughput, watch mapping, and the tenant-client cache. They are baselines, not merge gates. The reconcile benchmark is envtest-backed and skips without `KUBEBUILDER_ASSETS`.

```shell
ilo bash -c 'go test -bench=. -benchmem -run=^$ ./internal/operator/'
```

### Kind operator smoke tests

The cluster-level layer runs outside `go test`. Pure-`kubectl` bash scenarios in `hack/smoke/` run against a real kind cluster via `.github/workflows/kind-smoke.yml`. To run a scenario locally against any reachable cluster, deploy JaaS and invoke the scenario scripts directly:

```shell
hack/smoke/scenario-inline-files.sh
```

See [CI and releases](/contributing/ci-and-release/) for how the smoke layer fits into the two-angle end-to-end strategy.


---

# CI and Releases

Source: https://jaas.projects.metio.wtf/contributing/ci-and-release/


## Static analysis

golangci-lint is not used. The tools below run directly, both in CI and locally inside the dev shell (see [Building and Testing](/contributing/building/)). Every tool is a separate, auditable binary with its own config file.

| Tool | Scope | Config |
|------|-------|--------|
| `go vet` (all analyzers) | Go correctness | — |
| [staticcheck](https://staticcheck.dev) | Bugs, simplifications, style | `staticcheck.conf` (`checks = ["all"]`) |
| [gosec](https://github.com/securego/gosec) | Security patterns | inline `#nosec` justifications |
| [govulncheck](https://go.dev/security/vuln/) | Known vulnerabilities in the dependency graph | — |
| [arch-go](https://github.com/arch-go/arch-go) | Architecture rules | `arch-go.yml` |
| [gofumpt](https://github.com/mvdan/gofumpt) | Strict formatting | — |
| [REUSE](https://reuse.software) | License / copyright metadata on every file | `REUSE.toml` |
| [yamllint](https://yamllint.readthedocs.io) | YAML | `.yamllint.yaml` |
| [actionlint](https://github.com/rhysd/actionlint) | GitHub Actions workflows | — |
| [markdownlint](https://github.com/DavidAnson/markdownlint-cli2) | Markdown | `.markdownlint.yaml` |
| [typos](https://github.com/crate-ci/typos) | Spelling | `.typos.toml` |
| [Trivy](https://github.com/aquasecurity/trivy) | Container image CVEs | — |

### Architecture rules

`arch-go.yml` pins two invariants enforced with 100% compliance:

- `api/v1` depends on neither the operator internals nor `sigs.k8s.io/controller-runtime`. The CRD types stay importable by external consumers without dragging the manager in. Scheme registration uses apimachinery's `runtime.NewSchemeBuilder` for exactly this reason.
- `internal/urlguard` — the SSRF-defence layer — depends on the standard library only, with no internal and no external imports. This keeps the IP/URL validation logic self-contained and straightforward to fuzz in isolation.

## The verify.yml PR gate

`.github/workflows/verify.yml` fans out into one job per concern. A failure points straight at the offending gate. CI installs each tool fresh via `go run <tool>@latest`; the dev shell pre-installs the same tools at the same versions, so local and CI runs agree.

| Job | What it runs |
|-----|--------------|
| `test` | `go build ./...` then `go test -v -cover ./...` |
| `lint-go` | `go vet ./...`, `staticcheck ./...`, `gosec ./...`, `gofumpt -l .` (fails on any output) |
| `vulnerabilities` | `govulncheck ./...` — reachable-from-code advisories are a hard merge gate |
| `architecture` | `arch-go` against `arch-go.yml` |
| `reuse` | `fsfe/reuse-action` — every file must carry SPDX headers |
| `yaml` | `yamllint` against `.yamllint.yaml` |
| `github-actions` | `actionlint` |
| `markdown` | `markdownlint-cli2` against `.markdownlint.yaml` |
| `typos` | `typos` against `.typos.toml` |
| `prose` | Vale against the shared metio/vale-config style; error-level findings (naming/branding) fail the gate |
| `container-image` | `docker buildx` build (load, no push) followed by Trivy scan; hard-fails on any fixable `CRITICAL`/`HIGH` |

### All-green aggregate

The workflow ends with a single `all-green` job:

- `needs` every other job
- runs `if: always()`
- fails unless each dependency `result` is `success` or `skipped`

That one job is the **only** check marked required in branch protection. Adding a new job to the `needs` list of `all-green` covers it automatically; no new required check needs to be registered.

The `govulncheck` gate is a hard blocker. A reachable-from-code advisory that cannot be fixed by bumping a dependency blocks the PR until resolved. Resolution is usually a `toolchain` bump in `go.mod` (for stdlib advisories) or `go get -u` (for module advisories).

## The release pipeline

Releases are calendar-based and automated. `.github/workflows/release.yml` runs on a Monday cron (`47 7 * * MON`) plus manual `workflow_dispatch`. The version is computed from the run date:

```shell
date +'%Y.%-m.%-d'
```

For a Monday run on 2026-06-22 that produces `2026.6.22`.

goreleaser is not used. GPG is not used. The pipeline is hand-rolled across three jobs.

### prepare

Counts commits since the last release touching the build-relevant paths (`go.mod main.go internal api config Dockerfile`). Every downstream job gates on that count being non-zero (or there being no prior release at all), so an empty week publishes nothing.

### build

A cross-compile matrix over ten platform/arch combinations:

- `linux/amd64`, `linux/arm` (v7), `linux/arm64`, `linux/ppc64le`, `linux/riscv64`, `linux/s390x`
- `windows/amd64`, `windows/arm64`
- `darwin/amd64`, `darwin/arm64`

Each platform compiles with:

```shell
CGO_ENABLED=0 go build -trimpath \
  -ldflags="-s -w -X main.version=<ver> -X main.commit=<sha>" .
```

Archives are `tar.gz` on linux/darwin and `zip` on windows (with a `.exe` binary), each bundling `LICENSE` and `README.md`.

### container

A single `docker buildx` multi-arch push to `ghcr.io/metio/jaas:{latest,<version>}` over the six linux arches. The `Dockerfile` builder is pinned to `$BUILDPLATFORM` and cross-compiles via Go's `GOARCH`, so the multi-arch build needs no QEMU.

SBOM and provenance are attached. The image is signed with cosign keyless immediately after push:

```shell
cosign sign \
  --yes \
  --annotations "repo=metio/jaas" \
  --annotations "workflow=Automated Release" \
  ghcr.io/metio/jaas@<digest>
```

Identity is proven by the workflow's OIDC certificate issued by Fulcio; there is no key to distribute.

### github

Gates on both `build` and `container` succeeding. Downloads all platform archives, computes a single `SHA256SUMS` over them, signs it with cosign keyless (Sigstore bundle format), and publishes the GitHub Release with all archives, the checksum file, and the bundle attached.

To verify a release download:

```shell
cosign verify-blob jaas_<version>_SHA256SUMS \
  --bundle jaas_<version>_SHA256SUMS.bundle \
  --certificate-identity-regexp '^https://github.com/metio/jaas/\.github/workflows/release\.yml@refs/' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com
sha256sum -c jaas_<version>_SHA256SUMS
```

To verify the container image:

```shell
cosign verify ghcr.io/metio/jaas:<version> \
  --certificate-identity-regexp '^https://github.com/metio/jaas/\.github/workflows/release\.yml@refs/' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com
```