# Jsonnet-as-a-Service — full documentation > The complete Jsonnet-as-a-Service documentation (https://jaas.projects.metio.wtf/) concatenated for > LLMs. For a concise link index see https://jaas.projects.metio.wtf/llms.txt. # Jsonnet-as-a-Service **Jsonnet-as-a-Service (JaaS)** evaluates [Jsonnet](https://jsonnet.org/) and returns JSON. It runs in one of two modes: - **OCI volume mounting** — the chart mounts your snippets and libraries from OCI artifacts as image volumes, and JaaS serves the evaluated JSON over HTTP (`GET /jsonnet/`). Static content, no custom resources. - **Flux CR-based** — JaaS watches `JsonnetSnippet` and `JsonnetLibrary` resources and publishes the rendered output as a Flux [`ExternalArtifact`](https://fluxcd.io/flux/components/source/externalartifacts/) that any Flux consumer deploys. The two modes are mutually exclusive in one chart release; [Installation](/installation/kubernetes/) covers choosing one. ## What teams build with it - **Grafana dashboards with grafana-operator.** Author dashboards in Jsonnet (grafonnet), let JaaS render them, and have the [grafana-operator](https://grafana.github.io/grafana-operator/) reconcile the result into Grafana. See [Grafana dashboards](/tutorials/grafana-dashboards/). - **Kubernetes manifests with stageset-controller.** Render manifests from Jsonnet and roll them out in ordered, gated stages with [stageset-controller](https://stageset.projects.metio.wtf/). See [Deploying manifests with StageSet](/tutorials/deploying-manifests/). Both build on the same core: a snippet renders to an `ExternalArtifact`, and a downstream controller consumes it. JaaS only renders — what happens to the JSON is the consumer's concern, documented on that consumer's own site. ## Where to start - [Quickstart](/tutorials/quickstart/) — from a Helm install to a published artifact in a few steps. - [Tutorials](/tutorials/) — the two integrations above, plus running JaaS as a cluster-free local renderer. - [Usage](/usage/) — one page per feature, for both the HTTP renderer and the operator. - [Installation](/installation/) — Helm install, production hardening, and the full configuration reference. - [API reference](/api/) — every field of `JsonnetSnippet`, `JsonnetLibrary`, and the `ExternalArtifact` output contract. - [Runbooks](/runbooks/) — symptom, cause, and remediation for every Ready-condition reason. ## Project - Source, releases, and the container image: [github.com/metio/jaas](https://github.com/metio/jaas) - Helm chart: [`oci://ghcr.io/metio/helm-charts/jaas`](https://github.com/metio/helm-charts/tree/main/charts/jaas) --- # Installation Source: https://jaas.projects.metio.wtf/installation/ Install JaaS with [Helm](https://helm.sh/) in one of two mutually exclusive shapes — the stateless HTTP renderer or the Flux operator — then harden it for production and operate it day to day. The [configuration reference](/installation/configuration/) lists every flag and its chart value. --- # Configuration reference Source: https://jaas.projects.metio.wtf/installation/configuration/ Every JaaS flag is listed here with its default and a one-line description. Run `jaas --help` to see the same list at runtime. The tables on this page are generated from the binary's own flag definitions, so they never drift from the runtime contract. The Helm chart exposes most flags under `arguments.*`; operator-specific flags are under `operator.*`. The full set of chart values is in the [Helm chart values](/installation/helm-values/) reference. ## Jsonnet server The Jsonnet server evaluates snippets and returns JSON. It binds on `--listen-address:--port` by default. {{< flag-table group="Jsonnet server" >}} ## Management server The management server exposes the three Kubernetes probe endpoints. It binds on `--management-listen-address:--management-port`. {{< flag-table group="Management server" >}} Endpoints: `GET /start` (startup probe), `GET /ready` (readiness probe), `GET /live` (liveness probe). Startup and readiness return `503` with a `{"status":"…"}` JSON body when the server is not yet ready. Liveness is an unconditional `200`. ## Snippets and libraries Flags for declaring the Jsonnet files the server serves. {{< flag-table group="Snippets and libraries" >}} Snippet name resolution uses Go's `os.OpenRoot`, which rejects `..` traversal and symlinks that escape the configured directory. This is security-critical; see [Evaluation and security](/usage/evaluation-and-security/). ## External variables {{< flag-table group="External variables" >}} **Environment variable alternative:** set `JAAS_EXT_VAR_=` to expose `` as an external variable. The `--ext-var` flag overrides the env mechanism on key conflict. See [External variables and TLAs](/usage/external-variables-and-tlas/) for usage examples. ## Evaluation limits {{< flag-table group="Evaluation limits" >}} `--evaluation-timeout` fires the HTTP response but does not terminate the underlying go-jsonnet call — the evaluation continues consuming CPU until it finishes naturally. Size container resources accordingly and use `--max-concurrent-evals` to bound worst-case goroutine pile-up. See [Evaluation and security](/usage/evaluation-and-security/) for the full discussion. ## Lifecycle {{< flag-table group="Lifecycle" >}} ## Operator (Flux integration) The following flags are only active when `--enable-flux-integration` is set. {{< flag-table group="Operator (Flux integration)" >}} **Environment variable:** `JAAS_WATCH_NAMESPACES` — comma-separated namespace list. Superseded by `--watch-namespaces` when both are set. ## Storage server (local and S3) The storage server is the HTTP file server that downstream Flux consumers fetch artifacts from. It is started only when `--enable-flux-integration` is set. {{< flag-table group="Storage server (local and S3)" >}} ### S3 flags Active only when `--storage-backend=s3`. {{< flag-table group="S3 flags" >}} ## Webhook (TLS provisioning) Active only when `--enable-webhook` is set (which also requires `--enable-flux-integration`). {{< flag-table group="Webhook (TLS provisioning)" >}} See [Admission webhook](/usage/admission-webhook/) for the full `failurePolicy` trade-off and cert rotation details. ## Leader election {{< flag-table group="Leader election" >}} ## Observability ### Metrics {{< flag-table group="Metrics" >}} ### Tracing {{< flag-table group="Tracing" >}} ## Logging and lifecycle {{< flag-table group="Logging and lifecycle" >}} --- # Helm chart values Source: https://jaas.projects.metio.wtf/installation/helm-values/ The jaas Helm chart lives in the [metio/helm-charts](https://github.com/metio/helm-charts/tree/main/charts/jaas) monorepo and is published at `oci://ghcr.io/metio/helm-charts/jaas`. The tables below are generated from each chart's `values.yaml`, so they track the chart's current values rather than a hand-maintained copy. For how the values map onto the binary's runtime behaviour, see the [Configuration reference](/installation/configuration/) — every `arguments.*` value drives the corresponding `--flag`. ## jaas chart {{< helm-values data="helm-values" >}} ## joi library chart The [joi](https://github.com/metio/helm-charts/tree/main/charts/joi) chart publishes [Jsonnet OCI Images](https://github.com/metio/jsonnet-oci-images) as `JsonnetLibrary` + `OCIRepository` pairs, so snippets can import vendored libraries (grafonnet, k8s-libsonnet, …) without bundling them. Deploy it alongside jaas when snippets reference shared libraries. {{< helm-values data="joi-values" >}} --- # Kubernetes Source: https://jaas.projects.metio.wtf/installation/kubernetes/ JaaS ships as a container image at `ghcr.io/metio/jaas:latest` and as a Helm chart at `oci://ghcr.io/metio/helm-charts/jaas`. Pre-built binaries for Linux, macOS, and Windows are attached to each GitHub release for operators who prefer to run the binary directly. ## Prerequisites - A [Kubernetes](https://kubernetes.io/) cluster, **v1.28 or later**, with `kubectl` configured against it. - [Helm](https://helm.sh/) **v3.14 or later** — OCI chart support is required to pull the chart from `ghcr.io`. The Flux CR-based mode (below) additionally needs: - [Flux](https://fluxcd.io/) **v2.7.0 or later** in the cluster — the `ExternalArtifact` CRD that JaaS publishes lands in v2.7.0. - [cert-manager](https://cert-manager.io/) — **only** if you set the admission webhook to `cert-manager` mode. The chart defaults to `self-signed`, which provisions and rotates the webhook's TLS in-process and needs no cert-manager; see [Production](/installation/production/#admission-webhook-tls) for the trade-off. The OCI volume-mounting mode needs neither Flux nor cert-manager. ## Install and update `helm upgrade --install` is idempotent: the same command installs the chart the first time and applies your changes on every subsequent run, so it's the only deploy command you need. To update later, re-run it with an updated `--values` file or `--set` flags. The chart runs JaaS in one of two mutually exclusive modes in a single release. Pick the one that matches your use case; you **cannot** combine them in one release — the chart's pre-install preflight rejects the combination. ### Mode 1 — OCI volume mounting (HTTP renderer) JaaS evaluates Jsonnet snippets on demand and returns JSON over HTTP. Snippets and libraries are mounted into the pod from OCI artifacts as image volumes (the `snippets` and `additionalLibraries` chart values), read straight from a registry. There are no CRDs, no leader election, and no persistent storage — the pod is stateless. ```shell helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \ --namespace jaas-system --create-namespace \ --values my-values.yaml \ --wait ``` A minimal `my-values.yaml` — `snippets` and `additionalLibraries` are maps of `name: image-reference`: ```yaml # Snippets to render — a map of name: image. The name becomes the URL path, so # this snippet is served at GET /jsonnet/dashboards. snippets: dashboards: ghcr.io/my-org/my-dashboards:latest # Well-known libraries have a built-in toggle — enable grafonnet with one flag # (the chart already knows its JOI image). docsonnet and xtd work the same way. libraries: grafonnet: enabled: true # additionalLibraries mounts any OTHER library image — a JOI library without a # built-in toggle, or your own private bundle. The map KEY is the directory the # image mounts under and that the renderer adds to its import search path # (`--library-path /srv/libraries/`); it must be unique. The entry below # mounts ghcr.io/acme/jsonnet-acme-lib at /srv/libraries/acme. additionalLibraries: acme: ghcr.io/acme/jsonnet-acme-lib:latest ``` The chart mounts each image read-only and wires the renderer for you. The `dashboards` snippet is then reachable at `GET /jsonnet/dashboards`. A library is imported by the path it resolves to under its search directory — for a jb-vendored image like grafonnet, the full vendor path baked into it: ```jsonnet import 'github.com/grafana/grafonnet/gen/grafonnet-latest/main.libsonnet' ``` The Jsonnet HTTP server listens on port `8080` (configurable via `ports.http`). ### Mode 2 — Flux CR-based (operator) JaaS watches `JsonnetSnippet` and `JsonnetLibrary` CRs, evaluates snippets, and publishes the results as `ExternalArtifact` resources. Downstream Flux consumers (kustomize-controller, helm-controller, stageset-controller) fetch the rendered JSON from the artifact server. ```shell helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \ --namespace jaas-system --create-namespace \ --set operator.enabled=true \ --set operator.storage.persistence.enabled=true \ --wait ``` A minimal values snippet for the operator shape: ```yaml operator: enabled: true storage: # local backend with a PVC — enough for a single-replica install. For # multi-replica HA, switch to backend: s3 (see /installation/production/). backend: local persistence: enabled: true size: 10Gi ``` The operator publishes artifacts at the URL configured via `operator.storage.baseURL`. Left empty, it defaults to the in-cluster Service DNS name (`http://jaas-storage..svc.cluster.local:`), which is correct when downstream Flux consumers fetch artifacts from inside the cluster. Set it explicitly only when consumers dereference the artifacts through an Ingress or external hostname. ### How CRDs are handled The chart ships its CRDs (`JsonnetSnippet`, `JsonnetLibrary`) inside the regular templates — not Helm's special `crds/` directory — so a `helm upgrade --install` applies schema changes like any other resource, governed by `crds.create` (default `true`). The CRDs carry `helm.sh/resource-policy: keep`, so a `helm uninstall` leaves them — and your existing resources — in place; remove them by hand only if you really mean to. Check [MIGRATIONS.md](https://github.com/metio/jaas/blob/main/MIGRATIONS.md) before upgrading across a release that changes an immutable field such as a Deployment's `spec.selector.matchLabels` — those require a manual `kubectl --namespace jaas-system delete deploy jaas` first. If you manage CRDs out of band, the raw definitions are published in the repository under `config/crd/bases/` and can be applied with `kubectl apply --server-side -f`. ## Customize Every setting the chart exposes — the two modes above, storage backend, leader election, the admission webhook, NetworkPolicy, service mesh, metrics, and the rest — is a Helm value. Two references cover them: - [Helm chart values](/installation/helm-values/) — the full values reference, generated from the chart's own schema. - [Configuration reference](/installation/configuration/) — every binary flag and the chart value that drives it. For production sizing — S3 storage, multi-replica HA, observability, and webhook hardening — see the [Production guide](/installation/production/). ## Verify For the operator shape, confirm the Deployment is available and the CRDs are registered: ```shell kubectl --namespace jaas-system rollout status deploy/jaas kubectl get crd jsonnetsnippets.jaas.metio.wtf jsonnetlibraries.jaas.metio.wtf ``` For the HTTP renderer, confirm the pod is ready and the endpoint answers: ```shell kubectl --namespace jaas-system get pods --selector app.kubernetes.io/name=jaas kubectl --namespace jaas-system port-forward svc/jaas 8080:8080 & curl http://localhost:8080/jsonnet/my-dashboard ``` ## Next steps - [Quickstart tutorial](/tutorials/quickstart/) — five steps from a Helm install to a published artifact. - [Production hardening](/installation/production/) — storage, observability, the admission webhook, and multi-replica HA. --- # Operations Source: https://jaas.projects.metio.wtf/installation/operations/ Day-two operations for a running JaaS install. Initial install and hardening decisions are in [Kubernetes](/installation/kubernetes/) and [Production](/installation/production/). ## Graceful shutdown and drain When Kubernetes sends `SIGTERM`, JaaS executes a two-phase shutdown to avoid dropping in-flight requests: 1. The readiness probe flips to `false` (`503` on `/ready`). Kubernetes endpoint controllers begin deregistering the pod from Services. 2. JaaS waits for `--shutdown-delay` (default `5s`) before closing its listeners. This window lets the endpoint propagation complete so no new traffic arrives after the server closes. 3. After the delay, the servers shut down gracefully with a 30-second `context.WithTimeout`. The operator goroutine is also cancelled and awaited within the same 30-second window. The distroless runtime image has no `sleep` binary, so the drain delay is implemented in the binary rather than via a `preStop` hook. A second `SIGTERM` (or `SIGINT`) during the drain cuts the wait short. To disable the drain (zero delay): ```shell --shutdown-delay 0 ``` The chart value is `arguments.shutdownDelay`. ## Leader election during rolling updates In operator mode, leader election is on by default (`--leader-election`, `operator.leaderElection.enabled: true`). The chart sets `LeaderElectionReleaseOnCancel: true`, so when the old pod receives `SIGTERM` it releases the lease immediately instead of waiting out the 15-second `LeaseDuration`. The new pod picks up the lease within milliseconds. Snippets that were `Ready=True` before the restart stay in that condition via cached state. A new pod that takes over as leader reconciles them on the next watch event. If snippets remain degraded for more than a few seconds after a restart, check the [operator-watch-silent](/runbooks/operator-watch-silent/) runbook — it diagnoses the case where the operator's own ClusterRole is missing a verb so controller-runtime's informer silently fails to start. To force-restart the operator (e.g. after an upgrade): ```shell kubectl rollout restart deployment/jaas --namespace jaas-system ``` ## Artifact retention and storage GC Three independent mechanisms govern how long artifacts stay on disk (or in S3). Full storage backend configuration is in [Storage and HA](/usage/storage-and-ha/). ### GC grace window (`--artifact-gc-grace`, default `5m`) When a snippet is re-rendered, the superseded revision drops out of the keep-set but remains fetchable for `--artifact-gc-grace` after supersession. This closes the pin→fetch race in which a Flux consumer reads `status.artifact.url` a moment before the operator GC-prunes the old tarball. The window survives operator restarts — supersession time is derived from on-disk storage metadata, not from in-memory state. Set `0` to restore eager pruning (one revision at a time, matching stock Flux source-controller semantics). Tune lower when storage capacity is tight and all consumers are in-cluster. ### History retention (`spec.history`, default `1`, max `50`) Per-snippet deliberate retention for rollback and blue-green flows. A downstream consumer can pin to a specific `sha256` digest indefinitely as long as that revision is within the history keep-set. This is separate from the GC grace window — it is explicit operator intent, not a race-protection mechanism. ### Orphaned `.tmp` sweep (`--storage-sweep-interval`, `--storage-sweep-max-tmp-age`) A `Put` that dies between writing the tempfile and the atomic rename leaves a `.tar.gz.tmp` residue. The sweep goroutine runs on a ticker (default every `10m`) and removes `.tmp` files older than `--storage-sweep-max-tmp-age` (default `30m`). The age floor ensures live writers are never raced. Set `--storage-sweep-interval 0` to disable the sweep entirely. The `jaas_storage_sweep_failures_total` Prometheus counter signals failing sweep passes. ## Finalizer teardown and the WithdrawForced safety valve Every `JsonnetSnippet` holds a finalizer (`jaas.metio.wtf/finalizer`) that blocks Kubernetes garbage collection until the operator successfully calls `Publisher.Withdraw` to remove the artifact from storage. If the backend is permanently unavailable (S3 down, RBAC revoked, bucket deleted), the finalizer would otherwise hold the snippet — and by extension its namespace — in `Terminating` forever. `--max-withdraw-wait` (default `1h`) bounds how long the finalizer can hold. Once the deadline passes, the operator: 1. Emits a `Warning WithdrawForced` Kubernetes Event on the snippet. 2. Drops the finalizer so the snippet can be garbage-collected. The trade-off is a possible orphan tarball in storage. Recover it using the [storage-recovery](/runbooks/storage-recovery/) runbook. Adjust the bound with the chart value `operator.maxWithdrawWait`. Lower it in environments where namespace teardown latency is critical; raise it (or remove the concern by fixing the backend) in environments where artifact-safety is paramount. ## Upgrades Calendar-based releases ship every Monday. The chart version and the binary version advance together. ```shell helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \ --namespace jaas-system \ --values my-values.yaml \ --wait --timeout 5m ``` The chart ships CRDs under `templates/` so `helm upgrade --install` applies schema changes automatically. **Before each upgrade**, read [MIGRATIONS.md](https://github.com/metio/jaas/blob/main/MIGRATIONS.md): - Releases that change `spec.selector.matchLabels` on the Deployment require a manual `kubectl delete deployment/jaas` first — that field is immutable and `helm upgrade --install` will fail otherwise. - The pre-delete cleanup Job (`operator.cleanupOnDelete.enabled: true`, the default) runs on `helm uninstall` and drops every snippet's finalizer so `ExternalArtifact` resources are unwound before the operator pod is removed. If the cleanup Job hangs, check `operator.cleanupOnDelete.kubectlTimeout` (default `2m`) and the backend health. ## Monitoring operational health Key signals to watch: - `jaas_storage_sweep_failures_total` — non-zero means the sweep goroutine is erroring; investigate storage backend health. - `jaas_snippet_reconcile_total{status!="Synced"}` — elevated rate means snippets are failing to render; cross-reference with the `reason` label and the relevant runbook. - `JaaSControllerWorkqueueDepthHigh` PrometheusRule alert — workqueue is backing up; the operator cannot keep up with the reconcile rate. - `/ready` probe on the management port (default `8081`) — `503` after startup means the manager has not yet been elected or its cache has not synced. All metrics are documented in [Observability](/usage/observability/). All shipped alerts link to [Runbooks](/runbooks/). ## Next steps - [Configuration reference](/installation/configuration/) — the full flag list with defaults and chart value equivalents. - [Runbooks](/runbooks/) — incident response procedures keyed to each `Ready` condition `Reason`. --- # Production Source: https://jaas.projects.metio.wtf/installation/production/ The chart's defaults are safe for an initial install but not optimised for sustained production workloads. Work through these decisions before exposing JaaS to real traffic. Each links to the detailed guide. ## 1. Pick a storage backend The single largest decision. Artifacts must survive pod restarts and, for HA, be readable by every replica simultaneously. | Backend | Persistence | Multi-replica HA | |---|---|---| | `local` + emptyDir (chart default) | No | No | | `local` + RWO PVC | Yes | No — single replica only | | `local` + RWX PVC | Yes | Yes — requires RWX storage class | | `s3` | Yes | Yes — leader writes, all replicas read | For cloud installs, `s3` (AWS S3, MinIO, Ceph RGW, GCS S3-compat API) is the recommended backend. Pair it with leader election (on by default) so only the lease-holder writes. For on-prem, a PVC with the access mode your storage class supports is the practical path. Full configuration options and artifact retention are covered in [Storage and HA](/usage/storage-and-ha/) — including the garbage-collection grace period (`--artifact-gc-grace`) that keeps a just-superseded revision fetchable for a short window, so a consumer that read `status.artifact` moments before pruning doesn't 404 on the revision it pinned. Minimal S3 values (IRSA on EKS): ```yaml operator: enabled: true serviceAccount: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/jaas-operator storage: backend: s3 s3: endpoint: s3.amazonaws.com bucket: my-jaas-artifacts prefix: prod region: eu-west-1 useSSL: true # Leave accessKey/secretKey empty — IAM role via SA annotation. ``` ## 2. Size CPU and memory The chart defaults (64 MiB memory, 32m CPU) are fine for a quickstart but will OOM under sustained snippet rendering. Each in-flight evaluation is essentially uncancellable mid-flight — go-jsonnet has no mid-evaluation cancellation — so CPU and memory limits must accommodate the worst-case concurrent eval load. Set `--max-artifact-bytes` to cap the rendered output size per snippet so a runaway template can't allocate unbounded memory before the timeout fires. See [Evaluation and security](/usage/evaluation-and-security/) for the concurrent-eval cap, timeout defaults, and how to tune them. ```yaml resources: memory: 256Mi cpu: 100m operator: storage: maxArtifactBytes: 16777216 # 16 MiB; fails with ReasonArtifactTooLarge ``` ## 3. Enable observability The chart ships a metrics endpoint (on by default at port `8083`), an opt-in `ServiceMonitor`, and an opt-in `PrometheusRule` with a starter alert set. Turn them on and wire the Prometheus selector labels before deploying: ```yaml operator: metrics: enabled: true serviceMonitor: enabled: true labels: release: kube-prom # match your Prometheus's serviceMonitorSelector prometheusRule: enabled: true labels: release: kube-prom # match your Prometheus's ruleSelector extraAlertLabels: team: platform # Alertmanager routing label ``` `serviceMonitor.labels`, `prometheusRule.labels`, and `prometheusRule.extraAlertLabels` are three distinct label knobs: `serviceMonitor.labels` and `prometheusRule.labels` control which Prometheus instance picks up each CRD object; `extraAlertLabels` adds routing labels to individual alerts (for Alertmanager), not to the rule object. The shipped alert set and all custom JaaS metrics are documented in [Observability](/usage/observability/). ## 4. Enable the admission webhook {#admission-webhook-tls} The webhook rejects spec invariant violations — ext-var key collisions, library alias shadowing, import cycles — at `kubectl apply` time instead of at reconcile time. Pick a cert mode: ```yaml # Option A: cert-manager (recommended when cert-manager is installed) operator: webhook: enabled: true certMode: cert-manager certManager: enabled: true issuerRef: kind: ClusterIssuer name: letsencrypt-prod # Option B: self-signed (no cert-manager required) operator: webhook: enabled: true certMode: self-signed ``` The default `failurePolicy: Fail` blocks every `JsonnetSnippet` create/update cluster-wide when the webhook is unavailable. During a rolling update the window is typically under five seconds (leader election releases the lease on SIGTERM). If your GitOps tooling cannot tolerate that, scope the webhook via `operator.webhook.namespaceSelector` or `operator.webhook.objectSelector`, or switch to `failurePolicy: Ignore` and rely on the reconciler-side fallback. Full cert provisioning and failurePolicy trade-offs are covered in [Admission webhook](/usage/admission-webhook/). ## 5. Lock down tenant RBAC Every `JsonnetSnippet` runs impersonated as its `spec.serviceAccountName` (or the `--default-service-account` fallback). The operator's own ServiceAccount only needs `serviceaccounts/token: create` — every other API call (library reads, source fetches, `ExternalArtifact` writes) is done under the tenant SA's RBAC, so a compromised snippet can only reach what its SA is allowed to. Minimum per-tenant `Role`: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: name: jaas-tenant rules: - apiGroups: [source.toolkit.fluxcd.io] resources: [externalartifacts] verbs: [get, create, update, patch] - apiGroups: [jaas.metio.wtf] resources: [jsonnetlibraries] verbs: [get, list] - apiGroups: [source.toolkit.fluxcd.io] resources: [gitrepositories, ocirepositories, buckets, externalartifacts] verbs: [get] ``` When `operator.watchNamespaces` is set, the chart automatically switches from a ClusterRoleBinding to per-namespace RoleBindings. Full RBAC layout and NetworkPolicy notes for `spec.sourceRef` fetches are in [Tenancy and RBAC](/usage/tenancy-and-rbac/). ## 6. Plan for upgrades and disaster recovery Calendar-based releases run every Monday. Chart upgrades are `helm upgrade --install`; the chart ships CRDs under `templates/` so schema changes apply automatically. Read [MIGRATIONS.md](https://github.com/metio/jaas/blob/main/MIGRATIONS.md) before each upgrade — releases that change immutable `spec.selector.matchLabels` fields require a manual `kubectl delete deploy/jaas` first. Three runbooks to bookmark before go-live: - [storage-recovery](/runbooks/storage-recovery/) — PVC loss, S3 outages, disk-full, downstream 404s. - [rbacdenied](/runbooks/rbacdenied/) — tenant SA missing a verb, ExternalArtifact write forbidden, Flux source CRD not installed. - [operator-watch-silent](/runbooks/operator-watch-silent/) — the one failure mode JaaS cannot surface in snippet status (operator's own ClusterRole missing a verb so controller-runtime's informer silently fails). ## A complete production values.yaml ```yaml replicas: min: 2 max: 5 resources: memory: 256Mi cpu: 100m image: pullPolicy: IfNotPresent namespace: create: true podSecurity: enforce: restricted operator: enabled: true defaultServiceAccount: "" # force per-snippet spec.serviceAccountName serviceAccount: create: true annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/jaas-operator # TODO storage: backend: s3 s3: endpoint: s3.amazonaws.com # TODO bucket: my-jaas-artifacts # TODO prefix: prod region: eu-west-1 # TODO maxArtifactBytes: 16777216 metrics: enabled: true serviceMonitor: enabled: true labels: release: kube-prom # TODO prometheusRule: enabled: true labels: release: kube-prom # TODO extraAlertLabels: team: platform # TODO webhook: enabled: true certMode: cert-manager certManager: issuerRef: kind: ClusterIssuer name: letsencrypt-prod # TODO leaderElection: enabled: true cleanupOnDelete: enabled: true ``` Apply with: ```shell helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \ --namespace jaas-system --create-namespace \ --values production-values.yaml \ --wait --timeout 5m ``` ## Next steps - [Operations](/installation/operations/) — day-two tasks: rolling restarts, storage sweeping, finalizer teardown. - [Configuration reference](/installation/configuration/) — every flag and default. - [Runbooks](/runbooks/) — incident response. --- # Tutorials Source: https://jaas.projects.metio.wtf/tutorials/ Complete, copy-paste paths from nothing to a working result. The two integration tutorials cover the JaaS side — authoring a snippet and publishing the artifact; the consuming side (how grafana-operator reconciles a dashboard, how StageSet gates a rollout) is linked from each tutorial rather than repeated here. --- # Deploying manifests with StageSet Source: https://jaas.projects.metio.wtf/tutorials/deploying-manifests/ JaaS pairs with [stageset-controller](https://stageset.projects.metio.wtf/) to deploy Kubernetes manifests as code: you author the manifests in Jsonnet, the JaaS operator renders and publishes them as a Flux `ExternalArtifact`, and stageset-controller rolls that artifact out across ordered, gated stages. This tutorial covers the JaaS side — authoring the manifests with top-level arguments and external variables, and publishing the rendered JSON. The rollout side (the `StageSet` resource, its stages, gates, and actions) lives on the stageset-controller site and is linked at the end. ## Prerequisites - The JaaS operator installed and a tenant ServiceAccount granted the `externalartifacts` write verbs. The [Quickstart](/tutorials/quickstart/) covers both. - stageset-controller installed, if you intend to follow the handoff section and roll the manifests out. This tutorial uses the namespace `default` and the tenant ServiceAccount `manifests-tenant`. ## Step 1 — Grant the tenant ServiceAccount its verbs The snippet publishes an `ExternalArtifact`, so the tenant needs the `externalartifacts` write verbs: ```shell cat <.tar.gz 5s ``` If `READY` is `False`, describe the snippet — the Ready condition's `Reason` and `Message` name the cause: ```shell kubectl --namespace default describe jsonnetsnippet web-app ``` ## Step 4 — Inspect the published manifests Fetch the artifact from a one-shot pod to see the rendered manifests: ```shell URL=$(kubectl --namespace default get jsonnetsnippet web-app -o jsonpath='{.status.artifactURL}') kubectl run --rm -i --restart=Never --image=docker.io/curlimages/curl:8.10.1 fetch -- \ sh -c "curl -fsSL '$URL' | tar -xzO rendered.json" # { # "apiVersion": "v1", # "kind": "List", # "items": [ { "kind": "Deployment", ... }, { "kind": "Service", ... } ] # } ``` `rendered.json` is the manifest set a Flux consumer applies to the cluster. ## Handoff: roll the manifests out with StageSet The published `ExternalArtifact` is now ready for a Flux consumer. A consumer references it in one of two ways: - **Directly** — name the `ExternalArtifact` (which shares the snippet's name and namespace) in a `sourceRef`. - **Producer-aware** — name the producing `JsonnetSnippet` and let the consumer resolve it to the `ExternalArtifact`. JaaS writes a three-field back-pointer (`apiVersion`, `kind`, `name`) under the artifact's `spec.sourceRef` for this, which is the contract producer-aware resolvers match on. stageset-controller consumes the published artifact the producer-aware way and rolls it out across ordered, gated stages. The `StageSet` resource, its stages, gates, and actions live on the stageset-controller documentation: - **stageset-controller producer-aware sources guide:** - **stageset-controller project:** Follow that guide for the rollout side; it picks up exactly where this tutorial leaves off — at the published `ExternalArtifact`. ## Where to go next - [Operator mode](/usage/operator-mode/) — the full operator reference, including the `ExternalArtifact` `spec.sourceRef` back-pointer contract that producer-aware consumers match on. - [Snippet sources](/usage/snippet-sources/) — back the manifests with a `GitRepository` or OCIRepository instead of inline `spec.files`. --- # Grafana dashboards Source: https://jaas.projects.metio.wtf/tutorials/grafana-dashboards/ JaaS pairs with the [grafana-operator](https://grafana.github.io/grafana-operator/) to manage Grafana dashboards as code: you author the dashboard in Jsonnet, the JaaS operator renders it and publishes the dashboard JSON as a Flux `ExternalArtifact`, and the grafana-operator reconciles that artifact into a live Grafana instance. This tutorial covers the JaaS side — authoring the dashboard, importing grafonnet as a `JsonnetLibrary`, and publishing the rendered JSON. The grafana-operator side (the `GrafanaDashboard` CR, datasources, folders) lives on their site and is linked at the end. ## Prerequisites - The JaaS operator installed and a tenant ServiceAccount granted the `externalartifacts` write verbs. The [Quickstart](/tutorials/quickstart/) covers both. - The grafana-operator installed, if you intend to follow the handoff section and reconcile the dashboard into Grafana. This tutorial uses the namespace `default` and the tenant ServiceAccount `dashboards-tenant`. ## Step 1 — Grant the tenant ServiceAccount its verbs The snippet imports a `JsonnetLibrary`, so on top of the `externalartifacts` write verbs the tenant needs `get` on `jsonnetlibraries`: ```shell cat <.tar.gz 5s ``` If `READY` is `False`, describe the snippet — the Ready condition's `Reason` and `Message` name the cause (an RBAC gap on the library, an import alias collision, or a Jsonnet error): ```shell kubectl --namespace default describe jsonnetsnippet api-latency ``` ## Step 5 — Inspect the published dashboard JSON Fetch the artifact from a one-shot pod to see the rendered dashboard: ```shell URL=$(kubectl --namespace default get jsonnetsnippet api-latency -o jsonpath='{.status.artifactURL}') kubectl run --rm -i --restart=Never --image=docker.io/curlimages/curl:8.10.1 fetch -- \ sh -c "curl -fsSL '$URL' | tar -xzO rendered.json" # { # "panels": [ ... ], # "schemaVersion": 38, # "title": "API Latency" # } ``` `rendered.json` is the Grafana dashboard model — the exact JSON the grafana-operator hands to Grafana's dashboard API. ## Use real grafonnet instead of the toy helpers `grafana-helpers` kept this tutorial self-contained, but in production you import the real [grafonnet](https://github.com/grafana/grafonnet) library from a JOI image rather than hand-rolling constructors. Install it as a `JsonnetLibrary` with the [`joi` Helm chart](https://github.com/metio/helm-charts/tree/main/charts/joi): ```shell helm upgrade --install joi oci://ghcr.io/metio/helm-charts/joi \ --namespace default \ --set libraries.grafonnet.enabled=true ``` That renders an `OCIRepository` plus a `JsonnetLibrary` named `grafonnet`, sourcing `ghcr.io/metio/joi-grafana-grafonnet`. The snippet then references that library in place of `grafana-helpers` and imports the real grafonnet API by its full jb-vendor path: ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: api-latency namespace: default spec: serviceAccountName: dashboard-renderer libraries: - kind: JsonnetLibrary name: grafonnet files: main.jsonnet: | local g = import 'github.com/grafana/grafonnet/gen/grafonnet-latest/main.libsonnet'; g.dashboard.new('API Latency') + g.dashboard.withUid('api-latency') ``` Everything downstream is unchanged — it reconciles and publishes an `ExternalArtifact` exactly as in Steps 4–5; only the source of the library differs. ## Handoff: reconcile the dashboard into Grafana The published `ExternalArtifact` is now ready for the grafana-operator to consume. The grafana-operator reconciles a JaaS-published dashboard into Grafana through a `GrafanaDashboard` CR that references the artifact. That configuration — the `GrafanaDashboard` resource, the datasource and folder wiring, and the `Grafana` instance — lives on the grafana-operator's own documentation: - **grafana-operator JaaS example:** - **grafana-operator project:** Follow that example for the Grafana side; it picks up exactly where this tutorial leaves off — at the published `ExternalArtifact`. ## Where to go next - [Jsonnet libraries](/usage/jsonnet-libraries/) — serve the full grafonnet tree as a `JsonnetLibrary` backed by an OCIRepository, with the empty-`path` whole-vendor-tree pattern. - [Snippet sources](/usage/snippet-sources/) — back the dashboard with a `GitRepository` or OCIRepository instead of inline `spec.files`, and point `spec.entryFile` at one dashboard in a multi-dashboard tree. --- # Local rendering Source: https://jaas.projects.metio.wtf/tutorials/local-rendering/ JaaS runs as a cluster-free Jsonnet renderer: point it at a directory of snippets and a directory of libraries, then `GET` a snippet name to receive the evaluated JSON. No Kubernetes, no operator mode, no Flux. The evaluation core is the same one the operator uses, so a snippet that renders correctly here renders identically in-cluster. This tutorial runs against this repository's `examples/` layout, so clone the repo first: ```shell git clone https://github.com/metio/jaas cd jaas ``` ## Step 1 — Get the binary or container image Pre-built binaries are attached to each [GitHub release](https://github.com/metio/jaas/releases). Download the archive for your platform, unpack it, and the `jaas` binary is inside. A container image is published at `ghcr.io/metio/jaas:latest`: ```shell docker pull ghcr.io/metio/jaas:latest ``` The examples below use a `jaas` binary on your `PATH`. To run the container instead, mount `examples/` and map the port — for example `docker run --rm -p 8080:8080 -v "$PWD/examples:/examples" ghcr.io/metio/jaas:latest` with the flags adjusted to the in-container `/examples` paths, and `--listen-address 0.0.0.0` so the port is reachable from the host. ## Step 2 — Run JaaS over the examples directory Start JaaS with one snippet directory and one library path: ```shell jaas \ --snippet-directory examples/snippets/dashboards \ --library-path examples/libraries ``` `--snippet-directory` exposes each subdirectory as a snippet whose name is the directory name and whose entry file is `main.jsonnet`. `--library-path` makes the libraries under `examples/libraries` importable by alias. Both flags repeat, so you can pass several of each. The Jsonnet server binds `127.0.0.1:8080` by default. Confirm it started by hitting the readiness probe on the management server: ```shell curl -i http://127.0.0.1:8081/ready # HTTP/1.1 200 OK ``` ## Step 3 — Render a snippet `examples/snippets/dashboards/inheritance` is a self-contained snippet. Request it by directory name: ```shell curl http://127.0.0.1:8080/jsonnet/inheritance ``` JaaS returns the evaluated Jsonnet as JSON with `Content-Type: application/json`. The `library-precedence` snippet imports the `examplonet` library you exposed with `--library-path`: ```shell curl http://127.0.0.1:8080/jsonnet/library-precedence ``` A snippet name that resolves to no file returns a `404` with a JSON error body; a Jsonnet error returns a `400` carrying the go-jsonnet diagnostic. ## Step 4 — Pass a top-level argument Top-level arguments arrive as URL query parameters. The `multi-tla` snippet is `function(tags=["default"])` and joins its `tags` argument. Repeating a query key passes a list, which becomes a JSON array TLA: ```shell curl 'http://127.0.0.1:8080/jsonnet/multi-tla?tags=prod&tags=eu-west' # { # "count": 2, # "joined": "prod, eu-west", # "list": [ "prod", "eu-west" ] # } ``` A single occurrence of a query key (`?tags=prod`) passes a string instead of a one-element array. ## Step 5 — Set an external variable External variables are supplied through environment variables prefixed `JAAS_EXT_VAR_`. The variable after the prefix is the `std.extVar` key. Restart JaaS with the variables the `example1` snippet reads (`name` and `key`): ```shell JAAS_EXT_VAR_name=Alice \ JAAS_EXT_VAR_key=secret-value \ jaas \ --snippet-directory examples/snippets/dashboards \ --library-path examples/libraries ``` Then render the snippet: ```shell curl http://127.0.0.1:8080/jsonnet/example1 # { # ... # "person1": { # "external": "secret-value", # "name": "Alice", # "welcome": "Hello Alice!" # }, # ... # } ``` `std.extVar('name')` and `std.extVar('key')` resolve to the values from the environment. External variables are read once at startup, not per request. ## Same core as the operator The `jaas` binary evaluates Jsonnet through the same evaluation core whether it serves HTTP locally or reconciles a `JsonnetSnippet` in operator mode. Local rendering is the fast feedback loop for snippet authoring: a snippet that renders here — with the same libraries available — renders identically when the operator publishes it as an `ExternalArtifact`. ## Where to go next - [Rendering endpoint](/usage/rendering-endpoint/) — the request shape, snippet resolution, the management probes, and the stable error contract. - [Snippets and libraries](/usage/snippets-and-libraries/) — declaring snippets with `--snippet` and `--snippet-directory`, and libraries with `--library-path`. - [External variables and TLAs](/usage/external-variables-and-tlas/) — the full `JAAS_EXT_VAR_*` and query-parameter rules. --- # Quickstart Source: https://jaas.projects.metio.wtf/tutorials/quickstart/ This tutorial takes you from an empty cluster to one published Flux `ExternalArtifact` carrying rendered JSON. The path is operator mode with no optional knobs — no webhook, no S3, no Flux source CRs — and a single `JsonnetSnippet` whose source is inline `spec.files`. ## Prerequisites - A Kubernetes cluster. `kind`, `minikube`, or a managed cluster all work. - `kubectl` configured to talk to it. - `helm` 3.x. - Flux installed, at **v2.7.0 or newer**. A `JsonnetSnippet` publishes its result as a Flux `ExternalArtifact`, and the `ExternalArtifact` CRD lands in source-controller v1.7.0 (Flux v2.7.0) — earlier bundles have no such CRD and the publish path fails. Install all of Flux: ```shell kubectl apply -f https://github.com/fluxcd/flux2/releases/download/v2.7.0/install.yaml ``` ## Step 1 — Install the chart ```shell helm upgrade --install jaas oci://ghcr.io/metio/helm-charts/jaas \ --namespace jaas-system --create-namespace \ --set operator.enabled=true \ --set operator.defaultServiceAccount=default \ --wait --timeout 5m ``` `operator.defaultServiceAccount=default` tells the operator which ServiceAccount to impersonate in a tenant namespace when a snippet does not name its own. That is fine for this tutorial; production assigns a dedicated SA per tenant — see [Tenancy and RBAC](/usage/tenancy-and-rbac/). Verify the operator is running: ```shell kubectl --namespace jaas-system get deploy jaas # NAME READY UP-TO-DATE AVAILABLE AGE # jaas 1/1 1 1 30s ``` ## Step 2 — Grant the tenant ServiceAccount the minimum verbs The `default` ServiceAccount's built-in RBAC does not include the verbs the operator needs to publish the artifact. In the tenant namespace — here `default` — apply a `Role` and `RoleBinding`: ```shell cat <` — inline Jsonnet source. The default entry file is `main.jsonnet`; override it with `spec.entryFile`. - `spec.externalVariables` — `std.extVar()` lookups available to the snippet. Verify the resource exists: ```shell kubectl --namespace default get jsonnetsnippet hello ``` ## Step 4 — Confirm it reconciled ```shell kubectl --namespace default get jsonnetsnippet hello # NAME READY URL AGE # hello True http://jaas-storage.jaas-system.svc.cluster.local:8082/default/hello/.tar.gz 5s ``` The `URL` column is the artifact's address. If `READY` is `False`, describe the resource — the `Reason` and `Message` on the Ready condition name the problem (most commonly an RBAC gap or a Jsonnet syntax error): ```shell kubectl --namespace default describe jsonnetsnippet hello ``` The `ExternalArtifact` is the resource downstream Flux consumers read: ```shell kubectl --namespace default get externalartifact hello -o yaml # status: # artifact: # url: http://jaas-storage.jaas-system.svc.cluster.local:8082/default/hello/.tar.gz # digest: sha256: # revision: sha256: # size: ``` ## Step 5 — Fetch the rendered bytes The URL resolves in-cluster only. Fetch the tarball from a one-shot pod: ```shell URL=$(kubectl --namespace default get jsonnetsnippet hello -o jsonpath='{.status.artifactURL}') kubectl run --rm -i --restart=Never --image=docker.io/curlimages/curl:8.10.1 fetch -- \ sh -c "curl -fsSL '$URL' | tar -xzO rendered.json" # { # "greeting": "hello from jaas", # "rendered_at": "quickstart" # } ``` The tarball carries a single `rendered.json` because `spec.output` defaults to `rendered` (the evaluated JSON). Setting `spec.output: source` publishes the raw `.jsonnet` files instead, for consumers that re-evaluate themselves. ## Clean up ```shell kubectl --namespace default delete jsonnetsnippet hello kubectl --namespace default delete rolebinding jaas-tenant kubectl --namespace default delete role jaas-tenant helm --namespace jaas-system uninstall jaas kubectl delete namespace jaas-system ``` The chart's pre-delete hook waits for the snippet's finalizer to drop — which removes the `ExternalArtifact` — before the operator pod is removed, so the uninstall leaves no orphans. ## Where to go next - [Grafana dashboards](/tutorials/grafana-dashboards/) — render grafonnet dashboards and hand them to the grafana-operator. - [Deploying manifests with StageSet](/tutorials/deploying-manifests/) — render Kubernetes manifests and roll them out with stageset-controller. - [Operator mode](/usage/operator-mode/) — the full operator reference: source kinds, leader election, the artifact contract. - [Usage](/usage/) — every configuration knob, one page per concern. --- # Usage Source: https://jaas.projects.metio.wtf/usage/ One page per feature. JaaS has two faces over one evaluation core: the **HTTP renderer** and the **Flux operator** (`--enable-flux-integration`). The first pages below cover the renderer; the rest cover the operator. The [API reference](/api/jsonnetsnippet/) carries the exhaustive field-by-field detail. --- # Admission webhook Source: https://jaas.projects.metio.wtf/usage/admission-webhook/ JaaS ships an optional validating admission webhook for `JsonnetSnippet`. It is independent of, but layered on top of, [operator mode](/usage/operator-mode/). ## Enabling it Set `--enable-webhook` to boot the webhook server. It requires `--enable-flux-integration` (the webhook is wired only inside the operator boot path; enabling it alone is rejected as a flag error) and TLS material — `tls.crt` and `tls.key` — under `--webhook-cert-dir` (default `/tmp/k8s-webhook-server/serving-certs`). The server binds `--webhook-port` (default `9443`). ## What it validates The webhook rejects a `JsonnetSnippet` whose `spec.externalVariables` declares a key that collides with an operator-level `--ext-var`. An operator-supplied external variable always wins, so a snippet that tries to redeclare one would render against a value it does not control; the webhook refuses the snippet at admission time instead. The reconciler enforces the same invariant as a fallback, so a snippet that bypasses admission — for example when the webhook is unreachable under a `failurePolicy: Ignore` — still fails at reconcile rather than rendering with the wrong value. ## Failure policy trade-off The Helm chart defaults `operator.webhook.failurePolicy: Fail`. With `Fail`, a webhook outage blocks every `JsonnetSnippet` create and update cluster-wide until the operator is back. During a rolling update that window is typically under five seconds, because leader election releases the lease on context-cancel and the next replica takes over immediately. If your CI or GitOps tooling cannot tolerate even that window, narrow or relax the webhook: - `operator.webhook.objectSelector` — match only snippets carrying a label, e.g. require `jaas.metio.wtf/managed: "true"`. - `operator.webhook.namespaceSelector` — opt in per namespace. - `failurePolicy: Ignore` — let create/update through when the webhook is unreachable, relying on the reconciler-side fallback to catch the colliding-key case. ## TLS provisioning `--webhook-cert-mode` selects how the serving certificate is provisioned. ### cert-manager (default) `--webhook-cert-mode=cert-manager` expects external tooling to provision `tls.crt`/`tls.key`. The Helm chart renders a `cert-manager.io/v1` Certificate and mounts the issued Secret into the pod at `--webhook-cert-dir`. cert-manager handles renewal; the webhook server hot-reloads TLS when the mounted files change. ### self-signed `--webhook-cert-mode=self-signed` makes the operator generate a CA and serving certificate in-pod, write them to `--webhook-cert-dir`, and patch the named `ValidatingWebhookConfiguration`'s `caBundle` so the apiserver trusts the chain. A background renewer regenerates and re-writes the files before expiry, and the webhook server hot-reloads without a restart. The relevant flags: | Flag | Purpose | |---|---| | `--webhook-validating-config-name` | Name of the `ValidatingWebhookConfiguration` whose `caBundle` is patched. Required in this mode. | | `--webhook-service-name` | Service name the webhook is reachable through; used to build the certificate SANs (default `jaas-webhook`). | | `--webhook-service-namespace` | Namespace the webhook Service lives in. Empty falls back to `--leader-election-namespace`, then to the in-cluster downward API. | | `--webhook-cert-validity` | Validity of the self-signed serving certificate (default `8760h`, one year). | | `--webhook-port` | Port the webhook server binds to (default `9443`). | In self-signed mode the operator needs `get`/`update` on the named `ValidatingWebhookConfiguration`. Multiple replicas bootstrapping at once during a rolling update converge on a combined `caBundle` rather than clobbering each other, so each replica's CA stays trusted across the rollout. The full flag list with defaults is on the [configuration page](/installation/configuration/). --- # Alerting Source: https://jaas.projects.metio.wtf/usage/alerting/ JaaS turns a sustained problem into a notification two ways: a Prometheus `PrometheusRule` that pages on its [metrics](/usage/metrics/), and Kubernetes Events that Flux's notification-controller can route to chat or e-mail. ## The binary The operator emits a standard Kubernetes `Event` on every Ready-condition transition — `Normal` for `Synced`, `Warning` for every other reason. The reason string fills both the event `reason` and `action`. These Events need no flag to enable; they are written whenever the operator reconciles. The operator also threads runbook links into its own status automatically: every actionable Ready-condition Message gains a `(runbook: https://jaas.projects.metio.wtf/runbooks//)` suffix, so `kubectl describe jsonnetsnippet` points straight at the matching page. Healthy or intentional states (`Synced`, `Suspended`, `Pending`) get no suffix. ### Routing Events through Flux Routing the Events is Flux's `notification-controller`: target an `Alert` CR at `kind: JsonnetSnippet` and JaaS needs no `Provider`/`Alert` plumbing of its own. ```yaml apiVersion: notification.toolkit.fluxcd.io/v1beta3 kind: Alert metadata: name: jaas-snippets namespace: flux-system spec: providerRef: name: slack eventSeverity: warn # 'info' to include success events eventSources: - kind: JsonnetSnippet name: '*' ``` Wire whatever `Provider` you already use for Flux source CRs; see the [Flux notification-controller documentation](https://fluxcd.io/) for provider configuration. ### The alert catalog The chart ships a starter alert set on the custom metrics plus a handful of controller-runtime signals. Each alert carries its remediation page as a `runbook_url` annotation so Alertmanager renders a direct link: | Alert | Severity | Fires when | Threshold knobs (default) | Runbook | |---|---|---|---|---| | `JaaSSnippetReconcileErrorsHigh` | warning | A snippet keeps flipping to Ready=False (excluding `Synced`/`Suspended`/`Pending`). | `reconcileErrorRate` (0.1/s), `reconcileErrorDuration` (10m) | per-reason page under [`/runbooks/`](/runbooks/) | | `JaaSSnippetArtifactGrowing` | warning | p99 `jaas_snippet_rendered_bytes` exceeds the size ceiling. | `artifactSizeBytes` (16 MiB), `artifactSizeDuration` (30m) | [artifacttoolarge](/runbooks/artifacttoolarge/) | | `JaaSControllerWorkqueueDepthHigh` | warning | The `jsonnetsnippet` workqueue can't drain. | `workqueueDepth` (50), `workqueueDuration` (15m) | [workqueue-saturation](/runbooks/workqueue-saturation/) | | `JaaSReconcileLatencyHigh` | warning | p99 reconcile time crosses the ceiling. | `reconcileLatencySeconds` (30), `reconcileLatencyDuration` (15m) | [reconcile-latency](/runbooks/reconcile-latency/) | | `JaaSOperatorPodDown` | critical | A jaas pod stays NotReady. | `podDownDuration` (5m) | [operator-pod-down](/runbooks/operator-pod-down/) | | `JaaSStorageSweepFailures` | warning | Background sweeps fail per hour above the floor. | `sweepFailuresPerHour` (3), `sweepFailuresDuration` (30m) | [storage-recovery](/runbooks/storage-recovery/) | | `JaaSWebhookCertRenewalFailing` | critical | Self-signed cert renewal fails per hour above the floor. | `webhookCertRenewalFailuresPerHour` (1), `webhookCertRenewalFailuresDuration` (30m) | [webhook-cert-renewal](/runbooks/webhook-cert-renewal/) | | `JaaSTenantTokenMintFailing` | warning | Token mints fail for a `(namespace, serviceAccount)` pair. | `tenantTokenMintFailureRate` (0.01/s), `tenantTokenMintFailureDuration` (10m) | [rbacdenied](/runbooks/rbacdenied/) | | `JaaSForceDropsAccumulating` | warning | Snippet finalizers are force-dropped per hour above the floor. | `forceDropsPerHour` (0), `forceDropsDuration` (5m) | [storage-recovery](/runbooks/storage-recovery/) | | `JaaSCRDWatchEngagementFailing` | warning | A Flux source watch won't engage for a GVK. | `crdWatchEngagementFailuresPerHour` (1), `crdWatchEngagementFailuresDuration` (30m) | [crd-watch-engagement](/runbooks/crd-watch-engagement/) | | `JaaSEvalSaturation` | warning | In-flight evals exceed the saturation ratio of the cap (guarded on the cap being non-zero). | `evalSaturationRatio` (0.9), `evalSaturationDuration` (10m) | [eval-saturation](/runbooks/eval-saturation/) | | `JaaSEvalRejected` | warning | The semaphore turns evals away per second above the floor. | `evalRejectedRate` (0.05/s), `evalRejectedDuration` (10m) | [eval-saturation](/runbooks/eval-saturation/) | | `JaaSEvalLeakedGoroutines` | warning | Orphan eval goroutines persist above the floor — a runaway snippet. | `evalLeakedFloor` (0), `evalLeakedDuration` (5m) | [eval-saturation](/runbooks/eval-saturation/) | `JaaSSnippetReconcileErrorsHigh` templates its runbook URL on the failing reason, so it lands on the matching per-reason page under [`/runbooks/`](/runbooks/). Each Ready-condition reason and each alert maps to a remediation page there. ## The Helm chart The `PrometheusRule` is opt-in under `operator.metrics.prometheusRule` and needs the Prometheus Operator's `monitoring.coreos.com/v1` API in the cluster: ```yaml operator: enabled: true metrics: enabled: true prometheusRule: enabled: true interval: 30s # Labels your Prometheus instance selects PrometheusRules on. labels: release: kube-prometheus # Merged onto every rendered alert — route all jaas alerts # through one Alertmanager receiver. extraAlertLabels: team: platform # Annotation key the runbook URL lands under (Prometheus-operator # convention is runbook_url). runbookAnnotationKey: runbook_url ``` Every threshold is a knob under `operator.metrics.prometheusRule.thresholds`, so the noise floor is tunable without copy-pasting rule bodies. To silence a built-in alert, raise its threshold to an impossibly high value — there is no per-alert disable toggle, and the threshold pattern keeps "this alert is intentionally inert" visible in the chart values. Cluster-specific rules append under a separate group via `operator.metrics.prometheusRule.extraRules`. --- # Creating source artifacts Source: https://jaas.projects.metio.wtf/usage/creating-sources/ A `JsonnetSnippet`'s `spec.sourceRef` consumes a Flux source. The operator reads the referenced source CR's `status.artifact.url`, downloads the tarball Flux's source-controller serves there, verifies its `status.artifact.digest`, and extracts it into the snippet's file tree. Every supported kind — `GitRepository`, `OCIRepository`, and `Bucket` — reaches the operator through that same `status.artifact` contract, so the operator never talks to a git remote, an OCI registry, or an object store directly. Flux owns that fetch; the operator consumes the artifact Flux already produced. The recipes below show how to produce each source kind so a snippet can reference it. [Snippet sources](/usage/snippet-sources/) covers wiring the finished source into a `JsonnetSnippet`. For the source CRDs themselves and their full field reference, see the [Flux documentation](https://fluxcd.io/) — only what a JaaS source needs is covered here. A `JsonnetSnippet` references the source you create with a `spec.sourceRef`: ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: dashboards namespace: default spec: serviceAccountName: dashboards-tenant entryFile: dashboards/api-latency.jsonnet sourceRef: kind: GitRepository # or OCIRepository, or Bucket name: dashboards-source path: dashboards/ # optional: narrow extraction to a subtree ``` The tenant ServiceAccount needs `get` on the referenced source kind. See [Tenancy and RBAC](/usage/tenancy-and-rbac/) for the exact verbs. ## GitRepository A `GitRepository` source tracks a branch, tag, or commit of a git repository. source-controller clones the ref and packs the tree into the tarball the operator fetches. There is no packaging or layer constraint — the operator extracts whatever files the commit contains. 1. Lay out your Jsonnet files in a directory. File names and the directory structure carry over verbatim into the snippet's file tree, so place the entry file where `spec.entryFile` expects it: ```text dashboards/ ├── api-latency.jsonnet ├── error-budget.jsonnet └── lib/ └── panels.libsonnet ``` 2. Commit the files and push them to a git repository: ```shell git add dashboards/ git commit -m "Add Grafana dashboards" git push origin main ``` 3. Create the `GitRepository` source. With the Flux CLI: ```shell flux create source git dashboards-source \ --url=https://github.com/example-org/grafana-dashboards \ --branch=main \ --interval=5m \ --namespace=default \ --export ``` The equivalent CR YAML, which is authoritative: ```yaml apiVersion: source.toolkit.fluxcd.io/v1 kind: GitRepository metadata: name: dashboards-source namespace: default spec: interval: 5m url: https://github.com/example-org/grafana-dashboards ref: branch: main ``` 4. Point a snippet's `spec.sourceRef` at the source. Set `kind: GitRepository`, `name: dashboards-source`, and optionally `path:` to extract only a subtree: ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: api-latency-dashboard namespace: default spec: serviceAccountName: dashboards-tenant entryFile: dashboards/api-latency.jsonnet sourceRef: kind: GitRepository name: dashboards-source path: dashboards/ ``` When a new commit lands on the tracked branch, source-controller republishes the artifact and the operator's watch re-renders the snippet. ## OCIRepository An `OCIRepository` source pulls an OCI artifact from a registry. source-controller unpacks the artifact's single gzipped-tar layer into the tarball the operator fetches. Producing the artifact with `flux push artifact` packs a directory into exactly that shape. 1. Lay out your Jsonnet files in a directory, the same as for a git source: ```text ./ ├── main.jsonnet └── lib/ └── panels.libsonnet ``` 2. Push the directory as an OCI artifact with the Flux CLI. `flux push artifact` packs the directory into one gzipped-tar layer and pushes it to the registry: ```shell flux push artifact oci://ghcr.io/example-org/dashboards:v1 \ --path=. \ --source="$(git config --get remote.origin.url)" \ --revision="$(git rev-parse HEAD)" ``` `--source` and `--revision` stamp provenance metadata onto the artifact; set them to a URL and a version identifier of your choosing. 3. Create the `OCIRepository` source. With the Flux CLI: ```shell flux create source oci dashboards-source \ --url=oci://ghcr.io/example-org/dashboards \ --tag=v1 \ --interval=5m \ --namespace=default \ --export ``` The equivalent CR YAML, which is authoritative: ```yaml apiVersion: source.toolkit.fluxcd.io/v1 kind: OCIRepository metadata: name: dashboards-source namespace: default spec: interval: 5m url: oci://ghcr.io/example-org/dashboards ref: tag: v1 ``` 4. Point a snippet's `spec.sourceRef` at the source with `kind: OCIRepository`: ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: api-latency-dashboard namespace: default spec: serviceAccountName: dashboards-tenant entryFile: main.jsonnet sourceRef: kind: OCIRepository name: dashboards-source ``` > **Single layer is mandatory.** `flux push artifact` produces an OCI artifact > with exactly one gzipped-tar layer, which is what source-controller expects > and the only shape it unpacks. An artifact built any other way — a hand-rolled > `oras push` with one file per layer, a `Dockerfile`/container-image build, or > any tool that splits content across multiple layers — is not consumed > correctly. source-controller cannot reconstruct the file tree, the snippet's > source never resolves, and the snippet reports `Ready=False`. Always build OCI > sources with `flux push artifact`. Verify the layer count before relying on an artifact. Fetch the manifest and confirm the `layers` array has length 1: ```shell oras manifest fetch oci://ghcr.io/example-org/dashboards:v1 | \ jq '.layers | length' ``` A result of `1` is required. Any other number means the artifact was not built with `flux push artifact` and will not resolve. ### Private registries and Amazon ECR source-controller performs the pull, so registry credentials belong on the `OCIRepository` (or on source-controller itself) — never on the JaaS operator or a snippet's ServiceAccount. The same applies to a `JsonnetLibrary` whose `sourceRef` points at an `OCIRepository`. For a generic private registry, add a `spec.secretRef` to a `docker-registry` Secret. For **Amazon ECR you need no pull Secret at all**: set `spec.provider: aws` and source-controller authenticates with its own ambient AWS identity. On EKS that is an IRSA role bound to **source-controller's** ServiceAccount with ECR read permissions: ```yaml apiVersion: source.toolkit.fluxcd.io/v1 kind: OCIRepository metadata: name: dashboards-source namespace: default spec: interval: 5m provider: aws url: oci://111122223333.dkr.ecr.eu-west-1.amazonaws.com/dashboards ref: tag: v1 ``` The IRSA role needs `ecr:GetAuthorizationToken` (resource `*`) plus `ecr:BatchGetImage` and `ecr:GetDownloadUrlForLayer` on the repository. Because the credential is source-controller's, one role covers every `OCIRepository` it pulls, and the JaaS operator stays out of the registry path entirely. This IRSA role is source-controller's, not the JaaS operator's. The JaaS operator uses IRSA only for its own [S3 storage backend](/usage/storage-and-ha/) — a separate concern from pulling sources. There is a third way to load OCI content that does not go through a `sourceRef` at all: the chart can mount snippets and libraries from **OCI image volumes** (`snippets` / `additionalLibraries`), read straight from a registry into the pod. Those volumes are pulled by the **kubelet**, exactly like a container image — so they authenticate the way images do, not through IRSA. On EKS that means the **node's** IAM role with ECR read (the `AmazonEC2ContainerRegistryReadOnly` managed policy the default node role already carries), or an `imagePullSecret` on the pod. Pod-level IRSA grants the *pod's* ServiceAccount AWS API access, which the kubelet does not use when pulling images, so it is not the mechanism for this path. With a node role that can read ECR, image-volume snippets and libraries load with no pull Secret. Static OCI mounts and operator mode are mutually exclusive in one release, so a given install uses either these mounts or the `sourceRef` path above, not both. The [jsonnet-oci-images (JOI)](https://github.com/metio/jsonnet-oci-images) project enforces this same single-layer rule for every image it publishes, so its images are ready-made single-layer `OCIRepository` sources. Reference a JOI image directly when you need a shared Jsonnet library tree (grafonnet, the jsonnet-libs catalog) rather than building and maintaining your own OCI source. ## Bucket A `Bucket` source mirrors objects from an S3- or GCS-compatible bucket. source-controller fetches the matching objects, packs them into the tarball the operator fetches, and there is no layer constraint — the only requirement is that the objects laid out under the bucket prefix form the file tree your snippet expects. 1. Produce the files to upload. Either upload the individual `.jsonnet` / `.libsonnet` files under a prefix, or pack them into a single archive — both work, source-controller flattens the mirrored objects into the file tree: ```text dashboards/ ├── main.jsonnet └── lib/ └── panels.libsonnet ``` 2. Upload the files to the bucket under a prefix. With the AWS CLI against an S3-compatible endpoint: ```shell aws s3 cp dashboards/ s3://example-bucket/dashboards/ \ --recursive \ --endpoint-url=https://s3.example.com ``` 3. Create the `Bucket` source. With the Flux CLI: ```shell flux create source bucket dashboards-source \ --bucket-name=example-bucket \ --endpoint=s3.example.com \ --provider=generic \ --secret-ref=bucket-credentials \ --interval=5m \ --namespace=default \ --export ``` The equivalent CR YAML, which is authoritative: ```yaml apiVersion: source.toolkit.fluxcd.io/v1 kind: Bucket metadata: name: dashboards-source namespace: default spec: interval: 5m provider: generic bucketName: example-bucket endpoint: s3.example.com secretRef: name: bucket-credentials ``` The referenced Secret carries the bucket credentials (`accesskey` / `secretkey`). See the [Flux documentation](https://fluxcd.io/) for the Secret layout and provider-specific fields. 4. Point a snippet's `spec.sourceRef` at the source with `kind: Bucket`. Use `path:` to extract only the prefix that holds your Jsonnet: ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: api-latency-dashboard namespace: default spec: serviceAccountName: dashboards-tenant entryFile: main.jsonnet sourceRef: kind: Bucket name: dashboards-source path: dashboards/ ``` ## Which source should I use? | Source | Use when | |-----------------|-------------------------------------------------------------------------------------------------------| | `GitRepository` | Your Jsonnet is human-authored configuration living in a version-controlled git repository. | | `OCIRepository` | You want an immutable, content-addressed artifact; must be a single layer, and pairs with JOI images. | | `Bucket` | Your artifacts already live in S3- or GCS-compatible object storage. | --- # Evaluation and security Source: https://jaas.projects.metio.wtf/usage/evaluation-and-security/ JaaS runs Jsonnet on the server and returns the result over HTTP. Three caps bound each evaluation, and a small security model governs what a snippet and its callers can reach. Review and tune both sections before exposing the service to a wider audience. ## Evaluation caps | Flag | Default | Effect | |-------------------------|--------------------------|------------------------------------------------------------------------------| | `--evaluation-timeout` | `5s` | Wall-clock budget per evaluation. Exceeding it returns `504 evaluation_timeout`. `0` disables the timeout. | | `--max-stack` | `500` | Maximum Jsonnet call-stack depth. `0` uses go-jsonnet's own default. | | `--max-concurrent-evals` | `max(GOMAXPROCS*4, 16)` | In-flight evaluations allowed at once. Excess requests return `503 evaluation_unavailable`. `0` disables the cap. | ```shell ./jaas \ --snippet-directory examples/snippets/dashboards \ --evaluation-timeout 2s \ --max-stack 1000 \ --max-concurrent-evals 32 ``` The default for `--max-concurrent-evals` bounds worst-case goroutine pile-up under a runaway snippet. Each in-flight evaluation pins roughly one CPU for its working set, so raising the cap far above the available parallelism queues work without adding throughput. ## Security model **Library paths are an unrestricted read scope.** Any file reachable under a configured `--library-path`, or under a snippet's own directory, can be `import`-ed or `importstr`-ed by any snippet — go-jsonnet's importer does not sandbox per snippet. Scope these directories tightly. Never point them at `/`, `/etc`, or anywhere holding credentials. **Snippets are operator-controlled, not caller-controlled.** Callers supply only top-level arguments through the query string. Jsonnet's `import` and `importstr` require string-literal paths, so a TLA or external variable cannot construct an import path. Deploying a snippet authored by someone you do not trust is equivalent to running their code on the server. **Snippet name resolution is sandboxed.** The URL's snippet segment resolves through Go's `os.Root`, which rejects `..` traversal and symlinks that escape the configured snippet directory. A URL like `/jsonnet/../etc/passwd` returns `404`, even though the OS would otherwise resolve the path. **Evaluation has caps but no mid-flight cancellation.** `--evaluation-timeout` bounds wall-clock time and `--max-stack` bounds call-stack depth, but go-jsonnet cannot abort an evaluation already running. A slow snippet keeps consuming CPU until it finishes naturally or the timeout fires the HTTP response. Size container CPU and memory limits to absorb that worst case. The Prometheus metrics `jaas_eval_in_flight` (gauge: live in-flight count), `jaas_eval_unavailable_total` (counter: cumulative cap rejections), and `jaas_eval_outstanding_timed_out` (gauge: evals still running after their request timed out) surface how close evaluation runs to these caps. See [Observability](/usage/observability/) for detail. The HTTP status codes these caps produce are documented in the [rendering endpoint](/usage/rendering-endpoint/) error contract. --- # External variables and TLAs Source: https://jaas.projects.metio.wtf/usage/external-variables-and-tlas/ JaaS feeds two kinds of input into an evaluation: external variables, set by the process owner at startup, and top-level arguments, supplied per request through the URL query string. ## External variables External variables come from two sources. The environment mechanism reads every variable prefixed with `JAAS_EXT_VAR_` — the suffix is the variable name: ```shell JAAS_EXT_VAR_name=Alice \ JAAS_EXT_VAR_key=secret \ ./jaas --snippet-directory examples/snippets/dashboards ``` The `--ext-var KEY=VALUE` flag does the same and is repeatable. On a key conflict, the flag takes precedence over the environment value: ```shell ./jaas \ --snippet-directory examples/snippets/dashboards \ --ext-var name=Alice \ --ext-var key=secret ``` A snippet reads a variable with `std.extVar`: ```jsonnet { person1: { name: std.extVar('name'), external: std.extVar('key'), }, } ``` Fetching `example1` with those variables set produces: ```shell curl http://127.0.0.1:8080/jsonnet/example1 ``` ```json { "person1": { "external": "secret", "name": "Alice", "welcome": "Hello Alice!" } } ``` External variables are fixed at startup. Callers cannot set them per request — that is what top-level arguments are for. ## Top-level arguments A snippet that evaluates to a function receives top-level arguments (TLAs) from the URL query string. The `tla-example` snippet is such a function: ```jsonnet function(something="value", other="more", required) { person1: { welcome: 'Hello ' + something + '!', key: other, required: std.parseJson(required), }, } ``` Each query parameter sets a TLA. A single value becomes a string: ```shell curl 'http://127.0.0.1:8080/jsonnet/tla-example?something=Ada&required=42' # {"person1":{"key":"more","required":42,"welcome":"Hello Ada!"},...} ``` A repeated parameter becomes a list. The `multi-tla` snippet joins whatever it receives: ```jsonnet function(tags=["default"]) { count: std.length(tags), list: tags, joined: std.join(", ", tags), } ``` ```shell curl 'http://127.0.0.1:8080/jsonnet/multi-tla?tags=blue&tags=green' # {"count":2,"joined":"blue, green","list":["blue","green"]} ``` A bare parameter with no value sets the TLA to an empty string: ```shell curl 'http://127.0.0.1:8080/jsonnet/tla-example?something&required=0' ``` For the request and response shape these examples ride on, see the [rendering endpoint](/usage/rendering-endpoint/). --- # JOI images Source: https://jaas.projects.metio.wtf/usage/joi-images/ [Jsonnet OCI Images](https://github.com/metio/jsonnet-oci-images) (JOI) package popular Jsonnet libraries as single-layer OCI images, one per upstream library, published at `ghcr.io/metio/joi--`. Because each image is a single layer, the same artifact serves two roles: a container **image volume** mounted into jaas, and a Flux **`OCIRepository`** source the operator fetches — so a snippet imports a vendored library without bundling it. Deploy them with the [joi Helm chart](/installation/helm-values/#joi-library-chart), which renders a `JsonnetLibrary` + `OCIRepository` pair for each enabled library. A snippet then imports a library by its alias, choosing the version in the import path: ```jsonnet import 'github.com/jsonnet-libs/k8s-libsonnet/1.34/main.libsonnet' ``` The catalog below is generated from the [jsonnet-oci-images manifest](https://github.com/metio/jsonnet-oci-images/blob/main/libraries.json), so it always reflects the currently published set. Pin an image with the moving `:latest` tag or an immutable dated `:` snapshot. {{< joi-images >}} --- # Jsonnet libraries Source: https://jaas.projects.metio.wtf/usage/jsonnet-libraries/ Snippets import reusable Jsonnet from two places: namespaced `JsonnetLibrary` custom resources and OCI-mounted shared libraries the operator carries on disk. Both feed the same import-alias namespace, so a snippet's `import` statements look identical regardless of where the library comes from. ## The JsonnetLibrary CRD A `JsonnetLibrary` is a namespaced bundle of `.libsonnet` files. Like a snippet, it declares exactly one source — inline `spec.files` or a `spec.sourceRef` to a Flux source (`GitRepository`, `OCIRepository`, `Bucket`, `ExternalArtifact`). The library carries no registration name of its own; the import alias is chosen on the snippet side. ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetLibrary metadata: name: grafana-helpers namespace: default spec: files: dashboard.libsonnet: | { new(title): { title: title, panels: [], schemaVersion: 38, }, } panel.libsonnet: | { graph(title): { type: 'graph', title: title }, stat(title): { type: 'stat', title: title }, } ``` A `JsonnetLibrary` whose `spec.sourceRef` points at an `OCIRepository` lets you ship a jb-vendored library tree (grafonnet, docsonnet, and similar) as an OCI artifact and import it from snippets without inlining every file. ## Referencing a library from a snippet A snippet enumerates the libraries it can import in `spec.libraries[]`. Each entry is a `LibraryRef`: - `kind` — `JsonnetLibrary` (the only library kind). - `name` — the `JsonnetLibrary` resource's name. - `importPath` — the alias the snippet's `import` statements use. Defaults to the library's `name`. A library not listed in `spec.libraries` is invisible to the snippet even when it exists in the same namespace — the enumeration is the allowlist. ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: my-dashboard namespace: default spec: serviceAccountName: grafana-tenant entryFile: main.jsonnet files: main.jsonnet: | local dashboard = import 'grafana/dashboard.libsonnet'; local panel = import 'grafana/panel.libsonnet'; dashboard.new('API Latency') + { panels: [ panel.graph('p99 by route'), panel.stat('error rate'), ], } libraries: - kind: JsonnetLibrary name: grafana-helpers importPath: grafana ``` The snippet imports `grafana/dashboard.libsonnet` because the `LibraryRef` sets `importPath: grafana`. Drop `importPath` and the alias defaults to the library's name, `grafana-helpers`. The operator reads the `JsonnetLibrary` through the tenant's impersonating client, so the snippet's ServiceAccount needs `get` on `jsonnetlibraries.jaas.metio.wtf` — see [Tenancy and RBAC](/usage/tenancy-and-rbac/). ## OCI-mounted shared libraries Cluster-wide shared libraries are mounted into the operator pod's filesystem rather than expressed as CRs. The operator scans every `--library-path` directory at startup, reads every `.libsonnet` / `.jsonnet` / `.json` file into memory, and folds those entries into every snippet's import namespace additively — after the snippet's own `LibraryRef` resolution. A snippet imports an OCI-mounted library by alias with no `LibraryRef` at all: ```jsonnet local grafonnet = import 'grafonnet/main.libsonnet'; ``` With the Helm chart this is the `additionalLibraries` value, which mounts each configured OCI artifact under a `--library-path` directory. There is deliberately no cluster-scoped library CRD: a snippet produces a namespaced `ExternalArtifact`, so producers stay namespaced, while genuinely cluster-wide shared libraries take the OCI-mount path. This is also the path the cluster-free local renderer uses, so the same library tree renders identically on a workstation and in the cluster. ### Library-alias safety An OCI-mounted alias and a `LibraryRef` alias must not collide. When the operator starts with `--library-path` flags it records every mounted alias, and both admission and the reconciler reject any `LibraryRef` whose `importPath` (or, when `importPath` is omitted, the library `name`) shadows one of those names. This catches the case where grafonnet is mounted via OCI but a `JsonnetLibrary` `LibraryRef` is also aliased `grafonnet` — the additive merge would otherwise resolve the collision silently in favor of the CR. Rename the import alias or remove the `LibraryRef` to resolve the rejection. ## JsonnetLibrary vs a source-output snippet A `JsonnetLibrary` and a `JsonnetSnippet` rendered in `source` output mode both hand Jsonnet to another snippet, so they can look interchangeable. They differ on one axis — whether they publish an artifact: | | `JsonnetLibrary` | `source`-output `JsonnetSnippet` | |---|---|---| | Role | Passive dependency | Active producer | | Reached by | `import` by alias | `spec.sourceRef` to its `ExternalArtifact` | | When it loads | In-process during a snippet's evaluation | Fetched as a tarball before evaluation | | Scope | Same namespace as the importing snippet | Cross-namespace via the `ExternalArtifact` | | Publishes an artifact | No | Yes — content-addressed and revisioned | A `JsonnetLibrary` is a passive dependency: a snippet lists it in `spec.libraries`, imports it by alias, and the operator folds its files into the import namespace in-process while evaluating. It publishes no artifact and is visible only within its own namespace. A `source`-output `JsonnetSnippet` is an active producer: it publishes an `ExternalArtifact` carrying its raw Jsonnet. That artifact is content-addressed, revisioned, and consumable across namespaces, so a downstream snippet pins it with `spec.sourceRef` and re-evaluates the Jsonnet itself. Use a `JsonnetLibrary` for shared helpers your snippets import by alias. Use `output: source` chaining when one snippet's Jsonnet should feed another as a pinned Flux artifact — see [Chaining snippets](/usage/snippet-sources/#chaining-snippets). ## Import resolution The operator's in-memory importer resolves `import` and `importstr` statements with the same semantics as `jsonnet -J vendor`. A jb-vendored library tree renders identically on the operator path and locally — this parity is the reason the same Jsonnet works on a workstation and in the cluster without change. For an import path, resolution proceeds: 1. **Sibling-relative** — relative to the importing file within its own root, so a bare `import 'dashboard.libsonnet'`, `./x`, or `../x` resolves against the importing file's directory first. 2. **Bare alias** — a registered alias on its own resolves to that library's `main.libsonnet`. 3. **Alias plus file** — `alias/file` resolves `file` within the registered alias's tree; the alias head is authoritative. 4. **JPATH / vendor search** — the import path is searched across the snippet's own files and then every library, which is what lets an absolute `import 'github.com/grafana/grafonnet/gen/...'` resolve against a library whose tree carries the full vendor path. Sibling files win over a library's default entry, matching `jsonnet -J vendor`. A slash-prefixed path whose head is not a registered alias is not an error — it falls through to the vendor search. ## Related pages - [Snippet sources](/usage/snippet-sources/) — where a snippet's own Jsonnet comes from, including the same `sourceRef` mechanism libraries use. - [Snippets and libraries](/usage/snippets-and-libraries/) — the on-disk equivalent for the HTTP renderer, including `--library-path` precedence. --- # Logging Source: https://jaas.projects.metio.wtf/usage/logging/ JaaS logs through Go's `log/slog`. Every request, reconcile, and lifecycle event is a structured record you can filter and parse rather than scrape with a regex. In operator mode, controller-runtime's own output — leader election, cache sync, manager startup — flows through the **same** slog handler via the logr bridge (`ctrl.SetLogger(logr.FromSlogHandler(...))`), so the manager's logs share the configured level and format instead of emitting controller-runtime's default zap output. ## The binary Two flags control logging. They apply in every mode JaaS runs in: - `--log-level` — `debug`, `info`, `warn`, or `error`. Default `info`. - `--log-format` — `json` or `text`. Default `json`. `json` emits one JSON object per line, the right choice for a log pipeline (Loki, Elasticsearch, Cloud Logging) that indexes structured fields. `text` emits human-readable key=value lines, handy when tailing logs at a terminal during local development. The full flag list with defaults is on the [configuration page](/installation/configuration/). ### Reading logs With the default JSON format, pipe `kubectl logs` through `jq`. Tail the operator and pretty-print: ```shell kubectl --namespace jaas logs deployment/jaas --follow | jq . ``` Filter to warnings and errors only: ```shell kubectl --namespace jaas logs deployment/jaas | jq 'select(.level == "WARN" or .level == "ERROR")' ``` Follow a single snippet's reconciles by selecting on the logged fields: ```shell kubectl --namespace jaas logs deployment/jaas | jq 'select(.namespace == "team-a" and .name == "dashboards")' ``` Turn `--log-level=debug` on temporarily to see per-request evaluation detail and the operator's reconcile decisions; leave it at `info` in production to keep the volume down. ## The Helm chart The chart exposes both flags under `arguments`: ```yaml arguments: # debug, info, warn, error logLevel: info # json, text logFormat: json ``` These map one-to-one onto `--log-level` and `--log-format`. Keep `logFormat: json` for any cluster whose logs are ingested by a structured pipeline; switch to `text` only for ad-hoc local clusters where a human reads the raw stream. --- # Metrics Source: https://jaas.projects.metio.wtf/usage/metrics/ The JaaS operator exposes a Prometheus metrics endpoint covering controller-runtime's standard families plus a custom `jaas_*` family the reconciler registers. Scrape it for dashboards and feed it into the shipped [alerts](/usage/alerting/). ## The binary controller-runtime's Prometheus endpoint binds `--metrics-bind-address` (default `:8083`), serving the standard text exposition format at `/metrics`. Setting it to `0` disables the endpoint. The default deliberately avoids controller-runtime's built-in `:8080`, which would collide with the Jsonnet HTTP port. The full flag list with defaults is on the [configuration page](/installation/configuration/). ### Metrics reference The operator exports these custom `jaas_*` metrics, registered against controller-runtime's registry so they ride the same `/metrics` endpoint: | Metric | Type | Labels | Meaning | |---|---|---|---| | `jaas_snippet_reconcile_total` | counter | `namespace`, `name`, `status`, `reason` | One bump per reconcile that touches the Ready condition. `status` is `True`/`False`; `reason` is the Reason constant from the snippet's condition. | | `jaas_snippet_rendered_bytes` | histogram | `namespace`, `name` | Rendered artifact size, observed only on `Synced` reconciles. Buckets run 256 B…64 MiB. | | `jaas_snippet_rate_limited_total` | counter | `namespace`, `name` | Reconciles deferred by the per-snippet token bucket. Paired with the `RateLimited` Warning event. | | `jaas_snippet_eval_unavailable_total` | counter | `namespace`, `name` | Reconciles deferred because the global concurrent-eval cap was full. Paired with the `EvalUnavailable` Warning event. | | `jaas_snippet_force_drop_total` | counter | `namespace`, `name`, `reason` | Snippets whose finalizer was force-dropped because `Publisher.Withdraw` kept failing past `--max-withdraw-wait` or hit a permanent API error. `reason` names the trigger (`withdraw_timed_out`, `tenant_client_permanent`, `withdraw_permanent`). Sustained non-zero values mean orphaned tarballs are accumulating; see the [storage-recovery runbook](/runbooks/storage-recovery/). | | `jaas_eval_in_flight` | gauge | — | Evaluations currently holding a slot in the global concurrent-eval semaphore. Reads through to the live count on every scrape. | | `jaas_eval_max_concurrent` | gauge | — | Configured ceiling of the semaphore (`--max-concurrent-evals`). Zero means the gate is disabled — any saturation alert must guard on this being non-zero. | | `jaas_eval_unavailable_total` | counter | — | Process-global accumulator of evaluations the semaphore rejected, across the HTTP and operator paths. Monotonic; resets on restart. | | `jaas_eval_outstanding_timed_out` | gauge | — | Evaluation goroutines whose parent's context fired before the synchronous go-jsonnet call returned. Sustained non-zero readings flag a runaway snippet. | | `jaas_storage_sweep_failures_total` | counter | — | Background storage-sweep passes that returned an error. The sweep removes orphaned `.tar.gz.tmp` residue; failures here don't block reconciles but let stale files accumulate. | | `jaas_webhook_cert_renewal_failures_total` | counter | — | Self-signed cert renewal attempts that returned an error. Sustained non-zero values flag RBAC drift or a write-permission loss on `--webhook-cert-dir`; the existing cert's natural expiry is the deadline before admission breaks cluster-wide. | | `jaas_tenant_token_mint_failures_total` | counter | `namespace`, `serviceAccount` | `TokenRequest` mints that returned an error. Sustained non-zero values on a pair indicate revoked `serviceaccounts/token: create` or a deleted namespace; affected snippets pin Ready=Unknown. | | `jaas_crd_watch_engagement_failures_total` | counter | `gvk` | `EngageFluxWatch` calls that returned an error. Sustained non-zero values on a GVK mean dependent snippets won't re-render on upstream source events until the watch engages. | The eval gauges (`jaas_eval_in_flight`, `jaas_eval_max_concurrent`, `jaas_eval_outstanding_timed_out`) reflect the global concurrent-eval cap; see [evaluation and security](/usage/evaluation-and-security/) for how that cap works and how to size `--max-concurrent-evals`. Alongside these, controller-runtime contributes its standard families for free — `controller_runtime_reconcile_total`, `controller_runtime_reconcile_time_seconds`, the `workqueue_*` series (depth, latency, retries), and the Go/process collectors. The shipped [alerts](/usage/alerting/) build on both the `jaas_*` metrics and these controller-runtime signals. ### Querying with PromQL Once scraped, a few PromQL queries answer the common questions: ```promql # Rate of failed reconciles per snippet, excluding healthy/intentional states. sum by (namespace, name) ( rate(jaas_snippet_reconcile_total{status="False",reason!~"Synced|Suspended|Pending"}[5m]) ) # Eval semaphore saturation, guarded on the gate being enabled. jaas_eval_in_flight / jaas_eval_max_concurrent and jaas_eval_max_concurrent > 0 # p99 rendered artifact size per snippet. histogram_quantile(0.99, sum by (namespace, name, le) (rate(jaas_snippet_rendered_bytes_bucket[30m]))) ``` ## The Helm chart The metrics port is set under `ports`, and the chart wires it to a dedicated `jaas-metrics` Service whenever the operator is enabled: ```yaml ports: # controller-runtime metrics endpoint; maps to --metrics-bind-address. # Set to 0 in operator.metrics.enabled to disable entirely. metrics: 8083 ``` Scraping is configured under `operator.metrics`. A `ServiceMonitor` for the Prometheus Operator is opt-in — it selects the `jaas-metrics` Service and scrapes its `metrics` port at `/metrics`: ```yaml operator: enabled: true metrics: enabled: true serviceMonitor: enabled: true interval: 30s scrapeTimeout: 10s # Labels your Prometheus instance selects ServiceMonitors on. labels: release: kube-prometheus ``` Without the Prometheus Operator, point a plain Prometheus scrape config at the `jaas-metrics` Service (port `8083`, path `/metrics`), or add the usual `prometheus.io/scrape` annotation set to the pod through `pod.additionalLabels` and let a kubernetes-pods scrape job discover it. To turn the alerts on, see [Alerting](/usage/alerting/). --- # Network policy Source: https://jaas.projects.metio.wtf/usage/network-policy/ The Helm chart ships an opt-in `NetworkPolicy` for the JaaS pod. It is off by default and renders only when `networkPolicy.enabled` is `true`. Two independent layers are on offer: pod-scoped allowlists that lock down only JaaS's own pods (the safe default), and an additional namespace-wide default-deny for a zero-trust namespace. The ingress and egress tables below describe exactly what traffic JaaS depends on — in both renderer mode and [operator mode](/usage/operator-mode/) — so everything else can be denied. ## Two layers: pod-scoped allowlists vs. namespace default-deny `networkPolicy.enabled: true` renders per-workload, **pod-scoped allowlist** policies. They select only JaaS's own pods through their `app.kubernetes.io/*` labels and lock down just those pods to the required ports. This is the safe default and is fine in a shared namespace: co-located workloads — including anything in `flux-system` if JaaS shares that namespace — are untouched. `networkPolicy.defaultDeny.enabled` (default `false`) **additionally** renders a namespace-wide default-deny so every pod in the namespace is denied by default and the allowlists become the only exceptions (a zero-trust namespace). The default-deny sits at a lower precedence than the allowlists, so the allowlists always win for the JaaS pods while everything else is denied. Pick the layer that matches namespace ownership: - **`defaultDeny.enabled: false`** (default) — pod-scoped setup. Only JaaS's pods are locked down; neighbours keep whatever posture their own policies give them. - **`defaultDeny.enabled: true`** — namespace zero-trust. Enable this **only when JaaS owns its namespace**, because the deny-all also denies every co-located workload that does not have its own allowing policy. `defaultDeny.order` (default `2000`) tunes the Calico `order` / ClusterNetworkPolicy `priority` that keeps the deny-all subordinate to the allowlists. The `kubernetes` and `cilium` engines have no precedence knob — deny and allow combine additively and allow wins — so the value matters only for the `calico` and `clusterNetworkPolicy` engines. ```yaml networkPolicy: enabled: true defaultDeny: enabled: true # only when JaaS owns this namespace order: 2000 ``` ## Choosing a policy engine `networkPolicy.engine` selects which policy dialect the chart renders. It is explicit, not auto-detected: a chart that sniffed the running CNI would render different objects on different clusters from identical values, which breaks GitOps determinism. You name the engine, and the rendered manifest is the same everywhere. | `engine` | Renders | API | FQDN egress | | --- | --- | --- | --- | | `kubernetes` (default) | `NetworkPolicy` | `networking.k8s.io/v1` | No | | `cilium` | `CiliumNetworkPolicy` | `cilium.io/v2` | Yes — free `toFQDNs` egress | | `calico` | `NetworkPolicy` | `projectcalico.org/v3` | No — OSS Calico has no FQDN egress; that is Calico Enterprise only | | `clusterNetworkPolicy` | `ClusterNetworkPolicy` | `policy.networking.k8s.io/v1alpha2` | No | `clusterNetworkPolicy` renders the SIG-Network `ClusterNetworkPolicy` that consolidates the deprecated `AdminNetworkPolicy` + `BaselineAdminNetworkPolicy` APIs into one resource. It is alpha, cluster-scoped, and rendered in the `Baseline` tier so a developer-authored `NetworkPolicy` still takes precedence over it. ```yaml networkPolicy: enabled: true engine: cilium ``` The per-port `.from` knobs documented under [Configuring ingress](#configuring-ingress) apply to the `kubernetes` engine only. For the other engines the allowlists are pod-scoped allow-all on the required ports, and you tighten them through that engine's native passthrough lists — `networkPolicy..ingress` and `networkPolicy..egress` — which are merged verbatim into the rendered policy's `spec`. For example, adding identity-based ingress and a `toFQDNs` egress under the Cilium engine: ```yaml networkPolicy: enabled: true engine: cilium cilium: ingress: - fromEndpoints: - matchLabels: app.kubernetes.io/name: kustomize-controller egress: - toFQDNs: - matchName: bucket.example.com toPorts: - ports: - port: "443" protocol: TCP ``` ## Required traffic The traffic JaaS needs depends on the mode it runs in. The renderer-mode rows apply to every install; the operator-mode rows apply only when `operator.enabled` is `true`. ### Ingress | Port | Source | Mode | Selectable by label? | |---|---|---|---| | Jsonnet HTTP (`ports.http`, `8080`) | Callers of the `/jsonnet` endpoint, or an Ingress controller fronting the Service | always | Yes — or open when an Ingress fronts it | | Management probes (`ports.management`, `8081`) | The kubelet, dialing the readiness, liveness, and startup probes from the node IP | always | No — the node IP is not a pod, so it cannot be a `podSelector` | | Storage HTTP (`ports.storage`, `8082`) | The Flux consumers that dereference `ExternalArtifact` tarballs — kustomize-controller, helm-controller, and custom consumers such as stageset-controller | operator | Yes — by consumer namespace | | Webhook (`ports.webhook`, `9443`) | The kube-apiserver, dialing the validating admission webhook | operator + webhook | No — the apiserver is not a pod | | Metrics (`ports.metrics`, `8083`) | Prometheus scraping `/metrics` | operator + metrics | Yes — by the scraper's pod or namespace | The Jsonnet HTTP and management ports always get an ingress rule. The storage, webhook, and metrics ports each get their own rule when their mode is active — storage when `operator.enabled`, webhook when the operator's webhook is enabled, and metrics when the operator's metrics endpoint is enabled. The kubelet and the apiserver source traffic from addresses that are not pods, so their rules cannot be narrowed with a `podSelector` or `namespaceSelector`. Leaving the management and webhook `from` lists empty keeps those ports reachable, which is what lets probes succeed and the apiserver reach the webhook. Authenticity on the webhook port is enforced by TLS and the CA bundle on the `ValidatingWebhookConfiguration`, not by the network layer — see the [admission webhook page](/usage/admission-webhook/). ### Egress Egress only matters when you opt into it (`networkPolicy.egress.enabled`). The JaaS operator needs the following outbound flows; in renderer mode it needs only DNS, if that. | Destination | Purpose | Mode | Selectable by label? | |---|---|---|---| | Cluster DNS | Name resolution — without it every other egress flow fails | always | Yes — by the DNS namespace | | kube-apiserver | TokenRequest minting, CR reads, `ExternalArtifact` writes, leader election, and webhook caBundle patching | operator | No — `ipBlock` CIDR only | | source-controller | Fetching upstream artifacts for snippets that use a `sourceRef` | operator | Yes — the `flux-system` namespace | | S3 endpoint | Reading and writing tarballs when `storage.backend` is `s3` | operator + S3 | Depends — in-cluster MinIO is label-selectable; an external bucket is `ipBlock` only | | OTLP collector | Shipping traces when `operator.tracing.endpoint` is set | operator + tracing | Depends — in-cluster collector is label-selectable; an external one is `ipBlock` only | The kube-apiserver is never label-selectable, so its egress rule must be an `ipBlock` CIDR. The same applies to any S3 bucket or OTLP collector that lives outside the cluster. ## Configuring ingress Under the `kubernetes` engine, enable the policy and tighten each port through its `from` knob. An empty `from` list leaves that port open; a non-empty list restricts it to the listed peers. ```yaml networkPolicy: enabled: true # Open by default — typical when an Ingress fronts the Service. Set a # from-list to restrict callers of the /jsonnet endpoint. http: from: [] # Leave empty — the kubelet probes source from the node IP. management: from: [] # Leave empty — the kube-apiserver cannot be expressed as a podSelector. webhook: from: [] ``` The storage port defaults to allowing any pod in `flux-system`, the namespace where the stock Flux consumers run. Add an entry per extra consumer namespace — for example a stageset-controller running in `stageset-system`: ```yaml networkPolicy: enabled: true storage: from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: flux-system - namespaceSelector: matchLabels: kubernetes.io/metadata.name: stageset-system ``` The metrics port has its own ingress rule, rendered when both `operator.enabled` and `operator.metrics.enabled` are set. Scope it to your monitoring namespace through `networkPolicy.metrics.from`: ```yaml networkPolicy: enabled: true metrics: from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring ``` Anything the per-port knobs do not cover goes into `additionalIngress`, which is merged verbatim into the policy: ```yaml networkPolicy: enabled: true additionalIngress: - ports: - protocol: TCP port: 8080 from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: ingress-nginx ``` ## Opt-in egress Egress is off by default, and deliberately so. Adding the `Egress` policy type flips the JaaS pod to default-deny for outbound traffic — everything not explicitly allowed is dropped. Getting the allow-list complete is the cluster operator's risk, because the two destinations the operator needs most — the kube-apiserver and any external S3 or OTLP endpoint — are not label-selectable and so depend on `ipBlock` CIDRs that vary per cluster. An incomplete list does not fail loudly; it silently cuts the operator off. > **Warning:** Enabling egress **without** an `ipBlock` for the kube-apiserver cuts > the operator off from the control plane. It can no longer mint tokens, read CRs, > publish `ExternalArtifact` resources, hold the leader-election lease, or patch the > webhook caBundle. Always include the apiserver CIDR before turning egress on. Find the apiserver's address with: ```shell kubectl --namespace default get endpoints kubernetes -o jsonpath='{.subsets[*].addresses[*].ip}' ``` Use that IP as a `/32` (or your control plane's CIDR for an HA apiserver). A complete operator egress block — DNS, the apiserver, source-controller, S3, and an OTLP collector — looks like this: ```yaml networkPolicy: enabled: true egress: enabled: true # DNS to the cluster DNS namespace. Without this, every flow below # fails name resolution. dns: true dnsNamespace: kube-system to: # kube-apiserver — not label-selectable, so an ipBlock CIDR. # Replace with the IP(s) from the command above. - to: - ipBlock: cidr: 10.0.0.1/32 ports: - protocol: TCP port: 443 # source-controller — fetching upstream artifacts for sourceRef snippets. - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: flux-system # S3 bucket — an ipBlock CIDR for an external endpoint. For in-cluster # MinIO, use a namespaceSelector instead. - to: - ipBlock: cidr: 203.0.113.0/24 ports: - protocol: TCP port: 443 # OTLP collector — an ipBlock CIDR for an external endpoint. For an # in-cluster collector, use a namespaceSelector instead. - to: - ipBlock: cidr: 198.51.100.10/32 ports: - protocol: TCP port: 4317 ``` Trim this to what your install actually uses: drop the S3 block on the local storage backend, and drop the OTLP block when [tracing](/usage/observability/) is off. The apiserver and DNS rules are non-negotiable for the operator. Storage destinations are covered on the [storage and high availability page](/usage/storage-and-ha/), and tenancy on the [tenancy and RBAC page](/usage/tenancy-and-rbac/). For the full set of chart values, see the [chart README](https://github.com/metio/helm-charts/tree/main/charts/jaas). --- # Observability Source: https://jaas.projects.metio.wtf/usage/observability/ JaaS gives you four ways to see what it is doing in a cluster. Structured logs tell you what happened on a single request or reconcile; traces follow one operation across its spans; metrics aggregate behaviour into time series for dashboards and alerts; and alerts plus Kubernetes Events turn a sustained problem into a page or a notification. Each pillar has its own page covering both the binary's flags and the Helm chart keys that drive them: - [Logging](/usage/logging/) — the `log/slog` logger, `--log-level` and `--log-format`, and reading JSON logs with `kubectl logs` and `jq`. - [Tracing](/usage/tracing/) — OTLP gRPC export to an OpenTelemetry collector, sampling, and viewing spans. - [Metrics](/usage/metrics/) — the Prometheus endpoint, the custom `jaas_*` metric family, scraping with a `ServiceMonitor`, and querying with PromQL. - [Alerting](/usage/alerting/) — the opt-in `PrometheusRule` alert catalog with its runbook links, plus Kubernetes Events routed through Flux's notification-controller. Logging applies to every mode JaaS runs in. Tracing, metrics, and alerting are operator-mode concerns and take effect once `--enable-flux-integration` is set (`operator.enabled` in the chart). --- # Operator mode Source: https://jaas.projects.metio.wtf/usage/operator-mode/ JaaS runs as a Kubernetes operator alongside its HTTP renderer. In this mode it watches custom resources, evaluates the Jsonnet they describe, and publishes the result as a Flux [`ExternalArtifact`](https://fluxcd.io/) that downstream controllers consume. The HTTP renderer keeps running; the operator is an additional set of goroutines, not a separate binary. ## Enabling the operator Set `--enable-flux-integration` on the binary: ```shell jaas --enable-flux-integration \ --storage-path=/var/lib/jaas/artifacts \ --storage-base-url=http://jaas-storage.jaas.svc:8082 ``` `--storage-path` and `--storage-base-url` are required in operator mode — they tell the operator where to write artifact tarballs and the public URL prefix downstream consumers fetch them from. With the [Helm chart](/installation/) set `operator.enabled: true`: ```yaml operator: enabled: true ``` The chart wires the storage paths, leader election, RBAC, and the metrics Service for you. ## The two custom resources The operator watches two CRDs in the `jaas.metio.wtf/v1` API group. Both are namespaced. | Kind | Scope | Purpose | |------------------|------------|----------------------------------------------------------------------------------| | `JsonnetSnippet` | Namespaced | A Jsonnet snippet to evaluate and publish as an `ExternalArtifact`. | | `JsonnetLibrary` | Namespaced | Reusable `.libsonnet` files that snippets in the same namespace import. | A `JsonnetSnippet` is the published unit. A `JsonnetLibrary` carries no artifact of its own — it exists to be imported by snippets. The full field reference for each lives at [/api/jsonnetsnippet/](/api/jsonnetsnippet/); the library CRD is covered in [Jsonnet libraries](/usage/jsonnet-libraries/). ## What the operator produces Each reconcile of a `JsonnetSnippet` evaluates the snippet and writes the result into a tar.gz, then upserts a Flux `ExternalArtifact` CR whose `status.artifact.url` points at the operator's storage HTTP server. In the default `rendered` output mode the archive holds a single `rendered.json` — the evaluated JSON. The published artifact's URL is also mirrored onto the snippet's own `status.artifactURL`, so `kubectl describe jsonnetsnippet` answers "where is my rendered output?" without a second lookup. Any controller that understands Flux's `ExternalArtifact` reads the result by pointing a `sourceRef` at it: ```yaml apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: consume-rendered namespace: default spec: sourceRef: kind: ExternalArtifact name: hello-world ``` Real consumers of the published artifact include: - [grafana-operator](https://grafana.github.io/grafana-operator/) — renders Grafana dashboards from the evaluated JSON. - [stageset-controller](https://stageset.projects.metio.wtf/) — drives a staged rollout of the rendered manifests. - Flux's own `kustomize-controller` and `helm-controller`, which apply the rendered output as part of a GitOps pipeline. ## A minimal snippet The simplest `JsonnetSnippet` carries its Jsonnet inline in `spec.files` and seeds two external variables: ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: hello-world-tenant namespace: default --- apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: hello-world namespace: default spec: serviceAccountName: hello-world-tenant entryFile: main.jsonnet files: main.jsonnet: | { greeting: 'hello', recipient: std.extVar('audience'), timestamp: std.extVar('now'), } externalVariables: audience: world now: "2026-06-09T12:00:00Z" ``` `spec.serviceAccountName` names the ServiceAccount the operator impersonates for every API call this snippet drives — the artifact write, source fetches, library reads. That ServiceAccount's RBAC, not the operator's, governs what the snippet can reach. See [Tenancy and RBAC](/usage/tenancy-and-rbac/) for the verbs the tenant ServiceAccount needs. ## Lifecycle knobs Two `spec` fields control when and whether the operator reconciles a snippet. Both mirror Flux's source-controller conventions. ### `spec.suspend` Set `spec.suspend: true` to pause reconciliation without deleting the snippet. The operator skips the evaluation pipeline, leaves the existing `ExternalArtifact` in place, and reports `Ready=False` with reason `Suspended`. Setting it back to `false` resumes reconciliation. The published artifact stays available the whole time, so downstream consumers keep reading the last rendered output while the snippet is paused. ```yaml spec: suspend: true ``` ### `spec.interval` Set `spec.interval` to re-render the snippet on a fixed cadence even when no watch event fires: ```yaml spec: interval: 10m ``` A `JsonnetSnippet` re-renders whenever its source, libraries, or referenced Flux sources change. `spec.interval` adds a steady-state cadence on top of that, so the snippet picks up state outside the watched graph — external-variable environment drift on the operator pod, OCI library refreshes, and similar. The interval is bounded at admission to between `30s` and `24h`. Failed reconciles still use controller-runtime's exponential backoff; the interval governs only the steady-state cadence. ## Where to go next - [Snippet sources](/usage/snippet-sources/) — inline files, a Flux `sourceRef`, multi-snippet trees, and chaining one snippet's output into another. - [Jsonnet libraries](/usage/jsonnet-libraries/) — the `JsonnetLibrary` CRD, OCI-mounted shared libraries, and how imports resolve. - [Tenancy and RBAC](/usage/tenancy-and-rbac/) — per-snippet impersonation and the tenant ServiceAccount's permissions. - [Storage and HA](/usage/storage-and-ha/) — the local and S3 backends, leader election, and revision retention. - [/api/jsonnetsnippet/](/api/jsonnetsnippet/) — the exhaustive field-by-field reference. --- # Rendering endpoint Source: https://jaas.projects.metio.wtf/usage/rendering-endpoint/ Send a `GET` to the rendering endpoint with a snippet name and JaaS returns the evaluated Jsonnet as JSON: ```shell curl http://127.0.0.1:8080/jsonnet/example1 ``` The Jsonnet server binds `127.0.0.1:8080` by default (`--listen-address`, `--port`). The URL shape is `GET //{snippet...}`, where `{snippet...}` is a trailing path segment that may contain slashes. A successful response carries `Content-Type: application/json` and the rendered document. ## The endpoint path The leading path segment defaults to `jsonnet` and is set with `--jsonnet-endpoint-path`. Running with `--jsonnet-endpoint-path render` moves the endpoint to `GET /render/{snippet...}`: ```shell ./jaas --jsonnet-endpoint-path render --snippet-directory examples/snippets/dashboards curl http://127.0.0.1:8080/render/example1 ``` ## Snippet resolution The `{snippet...}` segment names which file JaaS evaluates. Resolution checks the `--snippet` files first, then looks for `/main.jsonnet` under each `--snippet-directory`. See [Snippets and libraries](/usage/snippets-and-libraries/) for how to declare both. Resolution is sandboxed through Go's `os.Root`, which rejects `..` traversal and symlinks that escape the configured directory. A crafted URL never reaches a file outside the snippet roots: ```shell curl -i http://127.0.0.1:8080/jsonnet/../etc/passwd # HTTP/1.1 404 Not Found ``` ## Management probes A second HTTP server — the management server — exposes the Kubernetes lifecycle probes. It binds `127.0.0.1:8081` by default (`--management-listen-address`, `--management-port`): | Path | Meaning | |----------|------------------------------------------------------------------------------| | `/live` | Liveness. Unconditional `200`. | | `/start` | Startup. Consults health state; `200` once started, otherwise `503` + JSON. | | `/ready` | Readiness. Consults health state; `200` when ready, otherwise `503` + JSON. | A not-ready probe returns a JSON body naming the state: ```shell curl -i http://127.0.0.1:8081/ready # HTTP/1.1 503 Service Unavailable # {"status":"not ready"} ``` ## Error contract Every non-2xx response carries a JSON body with `Content-Type: application/json` so programmatic callers can pick the failure apart: ```json { "error": "snippet_not_found", "message": "snippet \"missing\" not found", "snippet": "missing" } ``` The `error` field is a stable identifier — callers match on it, and these strings do not change. The `message` field carries human-readable detail. The `snippet` field echoes the requested name when one was parsed, and is omitted otherwise. | `error` | HTTP status | When | |--------------------------|------------:|---------------------------------------------------------------| | `method_not_allowed` | `405` | Anything other than `GET` on the endpoint. | | `snippet_not_found` | `404` | The requested snippet name resolves to no file. | | `evaluation_timeout` | `504` | Evaluation exceeded `--evaluation-timeout`. | | `evaluation_unavailable` | `503` | The concurrent-eval cap (`--max-concurrent-evals`) is full. | | `evaluation_failed` | `400` | go-jsonnet returned an error (syntax, missing import, stack-limit exceeded). | For `evaluation_failed`, `message` is the raw go-jsonnet diagnostic, including the file and line numbers from the snippet on disk. That diagnostic can name on-disk paths, so treat it as cluster-internal detail. A client that closes the connection mid-evaluation receives no body and no status line — the handler detects the cancellation and returns without writing anything. The timeout, stack, and concurrency caps that drive `evaluation_timeout` and `evaluation_unavailable` are documented in [Evaluation and security](/usage/evaluation-and-security/). To pass values into a render, see [External variables and TLAs](/usage/external-variables-and-tlas/). --- # Service mesh Source: https://jaas.projects.metio.wtf/usage/service-mesh/ The Helm chart ships an opt-in service-mesh authorization layer for the JaaS pod. It is off by default and renders only when `serviceMesh.enabled` is `true`. Where the [network policy](/usage/network-policy/) operates at L3/L4 — which pods and IP ranges may reach which ports — the service mesh operates at L7 with **identity-based authorization** and **mTLS**: it asks *which mesh identity* is calling, proven by a cryptographic workload certificate, not merely which IP the packet came from. `serviceMesh` and `networkPolicy` are separate fields and compose additively. They solve different problems and are best enabled together: the network policy draws the L3/L4 perimeter, and the mesh authorizes meshed callers by SPIFFE identity on top of it. Enabling one does not require the other, and neither weakens the other. ## Opt-in and explicit engine `serviceMesh.engine` selects which mesh dialect the chart renders. It is explicit, not auto-detected: a chart that sniffed the running mesh would render different objects on different clusters from identical values, which breaks GitOps determinism. You name the engine, and the rendered manifest is the same everywhere. | `engine` | Renders | API | | --- | --- | --- | | `istio` (default) | `AuthorizationPolicy` + (optional) `PeerAuthentication` | `security.istio.io/v1` | | `linkerd` | `Server` + `AuthorizationPolicy` + `MeshTLSAuthentication` | `policy.linkerd.io` | The rendered objects are inert unless the named mesh is actually installed and the JaaS pod is injected into it. Enabling `serviceMesh` on an un-meshed pod renders the manifests but changes nothing about how traffic flows. ```yaml serviceMesh: enabled: true engine: istio ``` ## Per-port authorization Each mesh-reachable port carries a `from` list naming the mesh identities allowed to call it. An **empty `from` list leaves that port open** to any meshed caller, mirroring `networkPolicy`'s empty-`from`-is-open semantics; a non-empty list restricts the port to the listed identities. The mesh-reachable ports are the Jsonnet HTTP port, the storage port, and the metrics port: | Port | Mode | `from` restricts | |---|---|---| | Jsonnet HTTP (`ports.http`, `8080`) | always | Callers of the `/jsonnet` endpoint | | Storage HTTP (`ports.storage`, `8082`) | operator | Flux consumers that dereference `ExternalArtifact` tarballs | | Metrics (`ports.metrics`, `8083`) | operator + metrics | Prometheus scraping `/metrics` | A `from` entry is a source matcher with two fields: - **`principals`** — SPIFFE/mesh identities. On Istio these are `source.principals` (`cluster.local/ns//sa/`). On Linkerd they map to `MeshTLSAuthentication` identities (`..serviceaccount.identity.linkerd.cluster.local`, or `*`). - **`namespaces`** — source namespaces (Istio `source.namespaces`). **Istio-only** — Linkerd authenticates by workload identity, not by namespace, so this field is ignored under the `linkerd` engine. ```yaml serviceMesh: enabled: true engine: istio # Restrict the storage port to the kustomize-controller's identity. storage: from: - principals: - cluster.local/ns/flux-system/sa/kustomize-controller - cluster.local/ns/flux-system/sa/helm-controller # Scope metrics scraping to the monitoring namespace. metrics: from: - namespaces: - monitoring # Leave http open so any meshed caller can reach the renderer. http: from: [] ``` ## Non-mesh clients keep working The kube-apiserver (which dials the admission webhook) and the kubelet (which dials the readiness, liveness, and startup probes) are **not part of the mesh**. They carry no mesh identity and present no workload certificate, so any authorization rule that demanded one would reject them — admission would break and probes would fail. The chart deliberately leaves the **webhook and management ports open** so these non-mesh clients always connect: - **Istio** adds an allow-any rule covering the webhook (`9443`) and management (`8081`) ports, so no identity is required on them. - With **`mtls: strict`**, the chart additionally sets `portLevelMtls: PERMISSIVE` on those two ports, so a plaintext connection from the apiserver or kubelet is still accepted even while every other port enforces mTLS. - **Linkerd** renders no `Server` for those ports, leaving them outside the mesh's authorization scope entirely. This is why admission and probes keep working with the mesh fully enabled. Authenticity on the webhook port is enforced by TLS and the CA bundle on the `ValidatingWebhookConfiguration`, not by the mesh — see the [admission webhook page](/usage/admission-webhook/). ## mTLS `serviceMesh.mtls` sets the mTLS posture and applies to the **Istio engine only**; Linkerd negotiates mTLS automatically between meshed pods, so the knob is ignored there. | `mtls` | Effect (Istio) | | --- | --- | | `""` (default) | Defers to the mesh's own default (mesh-wide `PeerAuthentication` / `MeshConfig`) — no `PeerAuthentication` is rendered | | `permissive` | Renders a `PeerAuthentication` accepting both mTLS and plaintext | | `strict` | Requires mTLS on the workload's ports, **except** the webhook and management ports, which get a port-level `PERMISSIVE` carve-out | ```yaml serviceMesh: enabled: true engine: istio mtls: strict ``` Under `strict`, every mesh-reachable port enforces mTLS while the apiserver and kubelet still reach the webhook and probes over plaintext via the carve-out above. ## Default-deny `serviceMesh.defaultDeny.enabled` (default `false`) additionally renders a namespace-wide default-deny so every pod in the install namespace rejects unauthorized mesh traffic and the per-workload allows become the only exceptions (a zero-trust namespace). - **Istio** renders an empty-spec `AuthorizationPolicy` (deny-all) scoped to the whole namespace; it sits at lower precedence than the workload `ALLOW`, so the per-port allows always win for the JaaS pod while everything else is denied. - **Linkerd** has no per-object deny-all; the namespace default is set via the `config.linkerd.io/default-inbound-policy` annotation, stamped onto the chart-managed Namespace (requires `namespace.create=true`) — otherwise annotate the namespace out of band. Enable this **only when JaaS owns its namespace**, because the deny-all also denies every co-located workload that does not have its own allowing authorization. ```yaml serviceMesh: enabled: true engine: istio defaultDeny: enabled: true # only when JaaS owns this namespace ``` ## Native passthrough Anything the per-port `from` knobs cannot express goes into the engine's native passthrough list, merged verbatim into the rendered objects. Under the `istio` engine, `serviceMesh.istio.rules` are merged into the `AuthorizationPolicy`'s `spec.rules` (the `security.istio.io/v1` rule schema) — use it for path/method matchers, `when` JWT-claim conditions, `ipBlocks`, and similar: ```yaml serviceMesh: enabled: true engine: istio istio: rules: - from: - source: requestPrincipals: - "https://accounts.example.com/*" to: - operation: paths: - /jsonnet/* ``` Under the `linkerd` engine, `serviceMesh.linkerd.authorizations` are appended verbatim as additional documents after the rendered `Server` / `AuthorizationPolicy` / `MeshTLSAuthentication` set — each entry must be a complete `policy.linkerd.io` object: ```yaml serviceMesh: enabled: true engine: linkerd linkerd: authorizations: - apiVersion: policy.linkerd.io/v1beta3 kind: AuthorizationPolicy metadata: name: jaas-extra spec: targetRef: group: policy.linkerd.io kind: Server name: jaas-http requiredAuthenticationRefs: - kind: ServiceAccount name: custom-caller namespace: tenant-a ``` ## Which traffic gets authorized The mesh authorizes only meshed callers on the mesh-reachable ports; the non-mesh ports stay open by design so the control plane keeps working. | Port | Authorized by the mesh? | Why | |---|---|---| | Jsonnet HTTP (`8080`) | Yes — `serviceMesh.http.from` | Meshed callers of the renderer | | Storage HTTP (`8082`) | Yes — `serviceMesh.storage.from` | Meshed Flux consumers | | Metrics (`8083`) | Yes — `serviceMesh.metrics.from` | Meshed Prometheus scrapers | | Webhook (`9443`) | No — open carve-out | The kube-apiserver is not in the mesh | | Management probes (`8081`) | No — open carve-out | The kubelet is not in the mesh | For the full set of chart values, see [Helm chart values](/installation/helm-values/). --- # Snippet sources Source: https://jaas.projects.metio.wtf/usage/snippet-sources/ A `JsonnetSnippet` declares exactly one source for its Jsonnet bytes: either inline `spec.files` or a `spec.sourceRef` pointing at a Flux source. Admission rejects a snippet that sets both or neither. The operator resolves the source into an in-memory file tree, evaluates `spec.entryFile` within it, and publishes the result. ## Inline files `spec.files` is a map of filename to Jsonnet source. The operator evaluates the entry file (`spec.entryFile`, default `main.jsonnet`) against the rest of the map. This is the simplest source — the snippet is self-contained, with no external dependency to fetch: ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: hello-world namespace: default spec: serviceAccountName: hello-world-tenant entryFile: main.jsonnet files: main.jsonnet: | { greeting: 'hello', recipient: std.extVar('audience'), } externalVariables: audience: world ``` ## A Flux source `spec.sourceRef` points at a Flux source CR whose artifact tarball the operator fetches and extracts into the snippet's file tree. The `kind` is one of `GitRepository`, `OCIRepository`, `Bucket`, or `ExternalArtifact` — see Flux's [source-controller documentation](https://fluxcd.io/) for how each source CR publishes its artifact. When the referenced source republishes — a new commit lands on the `GitRepository`, a new tag pushes to the `OCIRepository` — the operator's watch on Flux source kinds re-queues the snippet and re-renders it. No `spec.interval` is required for this; the watch is event-driven. ```yaml apiVersion: source.toolkit.fluxcd.io/v1 kind: GitRepository metadata: name: dashboards-source namespace: default spec: interval: 5m url: https://github.com/example-org/grafana-dashboards ref: branch: main --- apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: api-latency-dashboard namespace: default spec: serviceAccountName: dashboards-tenant entryFile: dashboards/api-latency.jsonnet sourceRef: kind: GitRepository name: dashboards-source path: dashboards/ ``` `spec.sourceRef.path` narrows extraction to a subdirectory of the artifact's tarball. Empty means the whole tree. The tenant ServiceAccount needs `get` on the referenced source kind — see [Tenancy and RBAC](/usage/tenancy-and-rbac/). ### The entry file and multi-snippet trees `spec.entryFile` names the file — relative to the resolved source root — that go-jsonnet evaluates. It defaults to `main.jsonnet`. The field is restricted to relative `[A-Za-z0-9._/-]+` paths with no `..` segments, so it cannot traverse out of the extracted tree. One Flux source often carries many snippets. A shared dashboards repository, for example, holds one `.jsonnet` file per dashboard. Rather than one source per dashboard, point several `JsonnetSnippet` resources at the same `GitRepository` and give each a different `entryFile`: ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: api-latency-dashboard namespace: default spec: serviceAccountName: dashboards-tenant entryFile: dashboards/api-latency.jsonnet sourceRef: kind: GitRepository name: dashboards-source --- apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: error-budget-dashboard namespace: default spec: serviceAccountName: dashboards-tenant entryFile: dashboards/error-budget.jsonnet sourceRef: kind: GitRepository name: dashboards-source ``` Both snippets share the source fetch and re-render together when the repository republishes, but each publishes its own `ExternalArtifact` from its own entry file. ## Chaining snippets A `JsonnetSnippet` can source from the `ExternalArtifact` another snippet publishes. This composes a pipeline of renders: snippet A evaluates and publishes its JSON, and snippet B takes that JSON as its input, transforms it, and publishes a second artifact. A downstream consumer deploys only the final artifact. Chaining works because the `ExternalArtifact` is a Flux source like any other. Snippet B sets `spec.sourceRef` with `kind: ExternalArtifact` and `name` pointing at the producing snippet — an `ExternalArtifact` is published under the producing `JsonnetSnippet`'s name. The operator fetches A's artifact tarball into B's file tree. In the default `rendered` output mode that tarball holds a single `rendered.json`, so snippet B sets `entryFile: rendered.json` to evaluate A's output. Because JSON is valid Jsonnet, B's entry file can extend the imported object directly: ```yaml # Snippet A renders a shared config blob other snippets consume. apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: base-config namespace: default spec: serviceAccountName: chained-tenant entryFile: main.jsonnet files: main.jsonnet: | { cluster: 'prod', region: 'eu-west-1', retentionDays: 30, } --- # Snippet B sources from base-config's ExternalArtifact and extends it. apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: derived-config namespace: default spec: serviceAccountName: chained-tenant entryFile: rendered.json sourceRef: kind: ExternalArtifact name: base-config ``` `derived-config` re-emits `base-config`'s rendered JSON as its own artifact. The operator's watch on `ExternalArtifact` updates re-queues `derived-config` whenever `base-config` republishes, so the pipeline stays current end to end. ### Source variant The rendered variant above passes evaluated JSON downstream. The source variant passes raw Jsonnet downstream instead, for snippet B to re-evaluate itself. Snippet A sets `spec.output: source`, so its `ExternalArtifact` carries A's raw `.jsonnet` / `.libsonnet` files rather than the evaluated JSON. Snippet B points `spec.sourceRef` at A's `ExternalArtifact` and imports A's files as Jsonnet, re-evaluating them with B's own external variables, TLAs, and libraries. A becomes a source that the pipeline produces dynamically rather than one an operator authors by hand: ```yaml # Snippet A publishes its raw Jsonnet, not its evaluated output. apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: dashboard-template namespace: default spec: serviceAccountName: chained-tenant output: source entryFile: main.jsonnet files: main.jsonnet: | function(env='dev') { title: 'API latency — ' + env, refresh: if env == 'prod' then '30s' else '5m', } --- # Snippet B sources A's raw Jsonnet and re-evaluates it with its own TLAs. apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: dashboard-prod namespace: default spec: serviceAccountName: chained-tenant entryFile: main.jsonnet sourceRef: kind: ExternalArtifact name: dashboard-template tlas: env: - prod ``` Because A published `source` output, B's `sourceRef` extracts A's raw `main.jsonnet` into B's file tree, and B evaluates it as the entry file with `env=prod` supplied as a TLA. When A's template changes, A republishes and the `ExternalArtifact` watch re-renders B against the new Jsonnet. Choose between the two by what the downstream snippet needs: rendered chaining passes JSON data downstream; source chaining passes Jsonnet to be re-evaluated downstream. For when to reach for a `source`-output snippet instead of a `JsonnetLibrary`, see [JsonnetLibrary vs a source-output snippet](/usage/jsonnet-libraries/#jsonnetlibrary-vs-a-source-output-snippet). The tenant ServiceAccount needs `get` on `externalartifacts.source.toolkit.fluxcd.io` for both variants; see [Tenancy and RBAC](/usage/tenancy-and-rbac/). ### Cycle detection A snippet cannot transitively depend on itself. The operator walks the dependency graph — `spec.sourceRef` edges to `ExternalArtifact`s and their producing snippets, plus `spec.libraries` edges through `JsonnetLibrary` `sourceRef`s — at reconcile time, before any tenant work. If the walk revisits the snippet it started from, the operator refuses to publish and reports `Ready=False` with reason `DependencyCycle`. This catches a chain that feeds back into itself directly or through a library, so a cycle surfaces as a clear status condition rather than an endless re-render loop. ## Related pages - [Jsonnet libraries](/usage/jsonnet-libraries/) — reusable `.libsonnet` files referenced via `spec.libraries`. - [Tenancy and RBAC](/usage/tenancy-and-rbac/) — the verbs the tenant ServiceAccount needs for each source kind. --- # Snippets and libraries Source: https://jaas.projects.metio.wtf/usage/snippets-and-libraries/ In renderer mode you declare snippets and libraries on disk through command-line flags. A snippet becomes reachable at the [rendering endpoint](/usage/rendering-endpoint/); a library is importable by any snippet. ## Directory snippets Point `--snippet-directory` at a directory whose subdirectories each hold a `main.jsonnet`. Each subdirectory name becomes a snippet name: ```shell ./jaas --snippet-directory examples/snippets/dashboards curl http://127.0.0.1:8080/jsonnet/example1 ``` Given this layout, `example1` resolves to `examples/snippets/dashboards/example1/main.jsonnet`: ```text examples/snippets/dashboards ├── example1 │ └── main.jsonnet ├── tla-example │ └── main.jsonnet └── multi-tla └── main.jsonnet ``` ## File snippets Point `--snippet` at an individual Jsonnet file. The file path becomes the snippet name: ```shell ./jaas --snippet examples/snippets/example.jsonnet curl http://127.0.0.1:8080/jsonnet/examples/snippets/example.jsonnet ``` Both `--snippet` and `--snippet-directory` are repeatable, so one process serves several roots: ```shell ./jaas \ --snippet-directory examples/snippets/dashboards \ --snippet examples/snippets/example.jsonnet ``` ## Libraries Point `--library-path` at a directory that holds importable Jsonnet libraries. A snippet imports a library by its path under that directory: ```text examples/libraries ├── examplonet │ └── main.libsonnet └── text └── welcome.txt ``` ```shell ./jaas \ --snippet-directory examples/snippets/dashboards \ --library-path examples/libraries ``` A snippet then imports the library with a string-literal path: ```jsonnet local examplonet = import 'examplonet/main.libsonnet'; { person1: { name: examplonet.standard, welcome: 'Hello ' + self.name + '!', }, } ``` `--library-path` is repeatable. When the same import path matches under more than one library directory, the rightmost matching directory wins — list the override directory last. ## Embedding non-Jsonnet files Use `importstr` to pull the raw contents of a file under a library path into a snippet as a string. The `embed-text` example reads a text file from the `text/` library: ```jsonnet { banner: importstr 'text/welcome.txt', length: std.length(self.banner), } ``` Any file reachable under a `--library-path` directory or the snippet's own directory can be `import`-ed or `importstr`-ed by any snippet. Scope these directories tightly — see [Evaluation and security](/usage/evaluation-and-security/). For the operator-side equivalent — `JsonnetLibrary` CRDs and OCI-mounted shared libraries — see [Jsonnet libraries](/usage/jsonnet-libraries/). --- # Storage and high availability Source: https://jaas.projects.metio.wtf/usage/storage-and-ha/ In [operator mode](/usage/operator-mode/) JaaS renders each `JsonnetSnippet` into a tar.gz artifact, stores it, and publishes an `ExternalArtifact` CR that points a Flux consumer at the tarball over HTTP. JaaS publishes artifacts through one of two storage backends — local filesystem or S3-compatible object storage — with optional leader election for multi-replica high availability and configurable revision retention. ## Serving the tarballs Regardless of backend, the operator runs an HTTP server that Flux consumers fetch artifacts from. Three flags govern it, and `--storage-base-url` and `--storage-path` are required whenever `--enable-flux-integration` is set: - `--storage-base-url` — the public URL prefix stamped into each `ExternalArtifact`'s `status.artifact.url`. This is what downstream Flux controllers dial, so it must be reachable from them. - `--storage-listen-address` (default `0.0.0.0`) and `--storage-port` (default `8082`) — the bind address of the storage HTTP server. ## Local backend `--storage-backend=local` (the default) writes tarballs to the filesystem under `--storage-path`. The Helm chart pairs this with an `emptyDir` by default, or a `PersistentVolumeClaim` when `operator.storage.persistence.enabled: true`. A ReadWriteOnce PVC caps the install at a single replica, because only one pod can mount the volume for writing. If you need more than one replica, use the S3 backend below. ## S3 backend `--storage-backend=s3` stores tarballs in any S3-compatible bucket (AWS S3, MinIO, Ceph RGW, Backblaze B2, and similar). The bucket must already exist. Configure it with: | Flag | Purpose | |---|---| | `--s3-endpoint` | S3 host:port, e.g. `s3.amazonaws.com` or `minio.minio.svc:9000`. Required. | | `--s3-bucket` | Bucket the artifacts live in. Required. | | `--s3-prefix` | Optional key prefix so JaaS can coexist with other tenants in one bucket. | | `--s3-region` | Region the bucket lives in. Required for AWS multi-region; ignored by most other servers. | | `--s3-use-ssl` | Talk HTTPS to the endpoint (default `true`). Set `false` only for local MinIO over HTTP. | | `--s3-access-key` | Static access key ID. | | `--s3-secret-key` | Static secret access key, paired with `--s3-access-key`. | | `--s3-session-token` | Optional session token for temporary credentials. | | `--s3-anonymous` | Skip request signing entirely; only for a public bucket, test and dev only. | Leave `--s3-access-key` and `--s3-secret-key` empty to engage the IAM/IRSA discovery chain — environment credentials, web-identity tokens, and EC2/EKS instance metadata — so a pod running with an IRSA-annotated ServiceAccount needs no static keys. ### Bring your own Secret The chart never bakes credentials into a rendered Secret. It references a Secret you provide by name (`operator.storage.s3.credentialsSecret.name`) and consumes it via `envFrom`, expecting the keys `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and the optional `AWS_SESSION_TOKEN`. The Secret's provenance is yours to choose. Any of these can produce that Secret, and the chart works with all of them unchanged — point the tool at the same name the chart references: - **External Secrets Operator** — an `ExternalSecret` that syncs from Vault, AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault into a Secret of that name. - **Sealed Secrets** — a `SealedSecret` that the controller decrypts in-cluster. - **Vault Agent / CSI** — a Secret materialized from Vault. - **SOPS** — a Secret decrypted by your GitOps tooling at apply time. - **`kubectl create secret`** — a plain hand-managed Secret. This is why the chart ships no native `ExternalSecret` resource: the reference seam already integrates with every secret backend, without coupling the chart to one operator's CRDs. On the cloud, **IAM/IRSA** — leaving the credentials Secret unset (above) — avoids a stored secret entirely and is preferred where available. A minimal External Secrets example whose `target.name` matches the referenced Secret and whose keys are the ones JaaS expects: ```yaml apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: jaas-s3 spec: refreshInterval: 1h secretStoreRef: name: vault kind: SecretStore target: name: jaas-s3-credentials # = operator.storage.s3.credentialsSecret.name data: - secretKey: AWS_ACCESS_KEY_ID remoteRef: key: jaas/s3 property: access_key_id - secretKey: AWS_SECRET_ACCESS_KEY remoteRef: key: jaas/s3 property: secret_access_key ``` For the full chart values, see the [Helm values](/installation/helm-values/) reference. ## Leader election Leader election is on by default in operator mode (`--leader-election`, honored only when `--enable-flux-integration` is set). The lease lets exactly one replica reconcile at a time. On `SIGTERM` during a rolling update the lease is released immediately rather than waiting out the 15-second lease duration, so the next replica picks up reconciliation within seconds. Set `--leader-election=false` only when running a single replica with no rollout overlap. ## Multi-replica HA High availability is the S3 backend plus leader election: every replica reads from the same bucket, and only the lease-holder writes. No ReadWriteMany storage class is required. During a rolling update the lease hands over on `SIGTERM`, so the write path moves to the new leader without a manual step. ## Revision retention and rollback `spec.history` on a `JsonnetSnippet` (default `1`, maximum `50`) keeps the last N rendered revisions in storage. Downstream consumers can pin to an older `sha256` for rollback or blue-green cutover, instead of always tracking the newest render. Two flags shape how superseded revisions age out: - `--artifact-gc-grace` (default `5m`) retains a revision for a short window after it leaves the keep-set. This closes the pin→fetch race in which a Flux consumer reads `status.artifact` a moment before the operator garbage-collects the revision that consumer pinned. Set it to `0` to disable the grace and restore eager pruning. Snippet teardown (the deletion path) is unaffected by this flag. - `--max-artifact-bytes` (default `0`, disabled) caps the rendered artifact size in bytes. A snippet whose render exceeds the cap fails with `ReasonArtifactTooLarge` rather than publishing an oversized tarball. ## Orphan-tmp sweep This is a local-backend concern only. On the filesystem backend a `Put` that dies after writing the temporary file but before the atomic rename leaves a `.tar.gz.tmp` residue, and a background sweep removes it. The S3 backend has no such residue — `PutObject` is atomic — so its sweep is a no-op: - `--storage-sweep-interval` (default `10m`) — how often the sweep runs. `0` disables it. - `--storage-sweep-max-tmp-age` (default `30m`) — the minimum age before an orphaned `.tmp` file is eligible, set wider than the longest plausible in-flight `Put` so the sweep never races a live writer. For production sizing of these knobs, see the [production guide](/installation/production/). The full flag list with defaults is on the [configuration page](/installation/configuration/). --- # Tenancy and RBAC Source: https://jaas.projects.metio.wtf/usage/tenancy-and-rbac/ In [operator mode](/usage/operator-mode/) the JaaS operator never acts with its own broad privileges when touching tenant resources. Every reconcile of a `JsonnetSnippet` runs against the RBAC of a tenant ServiceAccount, so a snippet can only reach what its own ServiceAccount is allowed to reach. ## Per-snippet impersonation Each `JsonnetSnippet` carries a `spec.serviceAccountName`. On every reconcile the operator mints a short-lived Bearer token for that ServiceAccount through the Kubernetes TokenRequest API (`serviceaccounts/token: create`) and performs all tenant-side API calls — reading `JsonnetLibrary` objects, fetching Flux source artifacts, and writing the published `ExternalArtifact` — as that ServiceAccount. The operator does not use the `impersonate` verb; it uses a real token, so the apiserver evaluates the tenant's own RBAC. When a snippet omits `spec.serviceAccountName`, the operator falls back to the ServiceAccount named in `--default-service-account`. If that flag is also empty, such a snippet is rejected at reconcile time rather than silently running with elevated rights. Set `--default-service-account` to a low-privilege account if you want snippets without an explicit ServiceAccount to reconcile at all. ## The operator's own ClusterRole Because every tenant-side call is the tenant's, the operator's own ClusterRole stays minimal: - `serviceaccounts/token: create` — to mint the Bearer tokens above. - `get`/`list`/`watch` on `customresourcedefinitions.apiextensions.k8s.io` — the CRD watcher subscribes to the cluster's CRD stream so that Flux source-kind watches engage automatically when a previously-absent CRD becomes established, without a process restart. - Watch verbs on the JaaS CRDs (`JsonnetSnippet`, `JsonnetLibrary`) and on the Flux source kinds it chains from (`GitRepository`, `OCIRepository`, `Bucket`, `ExternalArtifact`). The operator does not need `create`/`update`/`patch` on `ExternalArtifact` in its own ClusterRole — that write is done as the tenant, so the verb lives on the tenant Role below. ## The tenant Role The ServiceAccount each snippet runs as needs explicit verbs, or the first reconcile fails with `Forbidden` and the failure points at the wrong cause. Grant this Role in the tenant's namespace: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: name: jaas-tenant rules: # Required: the operator writes the snippet's ExternalArtifact as # the tenant ServiceAccount. Without these the publish step is denied. - apiGroups: [source.toolkit.fluxcd.io] resources: [externalartifacts] verbs: [get, create, update, patch] # Required only when the snippet uses spec.libraries (JsonnetLibrary refs). - apiGroups: [jaas.metio.wtf] resources: [jsonnetlibraries] verbs: [get, list] # Required only when the snippet uses spec.sourceRef. Grant only # the source kinds your tenants actually reference. - apiGroups: [source.toolkit.fluxcd.io] resources: [gitrepositories, ocirepositories, buckets, externalartifacts] verbs: [get] ``` Notes on each rule: - The `externalartifacts` write verbs (`create`, `update`, `patch`) are mandatory. The operator writes the published artifact CR through the impersonating client on purpose, so one tenant Role governs both source-side reads and artifact-side writes. - The `jsonnetlibraries` rule is needed only when a snippet references libraries through `spec.libraries`. See [snippet sources](/usage/snippet-sources/) for how libraries reach a snippet. - The source-kind `get` rule is needed only when a snippet has a `spec.sourceRef`. Grant only the kinds your tenants reference. The `externalartifacts` entry here covers chained snippets — snippet B reading the `ExternalArtifact` snippet A publishes. ## Binding per namespace For namespace-scoped multitenancy, bind the Role to each tenant ServiceAccount in its own namespace with a `RoleBinding`: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: namespace: name: jaas-tenant roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: jaas-tenant subjects: - kind: ServiceAccount name: namespace: ``` Each tenant namespace gets its own `Role` + `RoleBinding`, so a snippet's blast radius is its own namespace's grants. ## Single-tenant clusters When a cluster runs only your own workloads and snippets do not need isolating from each other, a Role per namespace is more than you need. The operator still impersonates a ServiceAccount — it never applies with its own identity — so the simplest setup is one shared account: 1. Create a single ServiceAccount and grant it the rights your snippets need. On a single-tenant cluster that can be broad: a `ClusterRoleBinding` to the built-in `cluster-admin` ClusterRole lets any snippet read any source and publish into any namespace. ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: jaas-snippets namespace: jaas-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: jaas-snippets-admin roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: jaas-snippets namespace: jaas-system ``` 2. Point the operator's `--default-service-account` at it (set through the chart's operator values), and leave `spec.serviceAccountName` off your snippets. Every snippet then reconciles as that one account. This trades isolation for simplicity — every snippet has the same rights, so use it only where you trust every snippet author. To move to multitenancy later, give individual snippets their own `spec.serviceAccountName` scoped to a tenant Role as above; anything still relying on the default keeps working. ## Restricting cross-namespace references `--no-cross-namespace-refs` defaults to `true`: a `JsonnetSnippet` or `JsonnetLibrary` whose `sourceRef` targets a different namespace is rejected. Keep this on for multitenancy — it stops one tenant from pointing a snippet at another tenant's source. Set it to `false` only when you operate every namespace yourself and deliberately want cross-namespace chaining. ## Narrowing the watch Two flags scope which CRs the operator reconciles: - `--label-selector` narrows the watch to CRs whose labels match the selector. Empty (the default) selects every CR in the watched scope. Use it to run an operator over only a labelled subset of snippets. - `--watch-namespaces` (or the `JAAS_WATCH_NAMESPACES` environment variable) takes a comma-separated namespace list and restricts the manager's cache to those namespaces. Empty (the default) is cluster-wide. The Helm chart's `operator.watchNamespaces` mirrors this: when set, it threads the value into the deployment's `--watch-namespaces` argument and pivots the rendered RBAC to one `RoleBinding` per listed namespace instead of a cluster-wide `ClusterRoleBinding`. Cluster-scoped resources (CRDs, the optional `ValidatingWebhookConfiguration`) stay bound through a `ClusterRoleBinding`, since they are inherently cluster-scoped. The full flag list, with defaults, is on the [configuration page](/installation/configuration/). --- # Tracing Source: https://jaas.projects.metio.wtf/usage/tracing/ The JaaS operator exports OpenTelemetry traces over OTLP gRPC. With an endpoint configured, each reconcile and the work it fans out into — source fetch, library resolution, evaluation, publish — becomes a span you can follow in a tracing backend. When no endpoint is set, the OpenTelemetry SDK runs in no-op mode and emits nothing, so tracing carries no cost until you opt in. ## The binary Three flags configure the exporter: - `--tracing-endpoint` — the OTLP gRPC collector `host:port`, e.g. `otel-collector.observability.svc:4317`. Empty (the default) disables tracing entirely. - `--tracing-insecure` — skip TLS when dialing the collector. Default `false`. Use only for in-cluster collectors that do not terminate TLS themselves. - `--tracing-sample-ratio` — TraceID-ratio sampling between `0.0` and `1.0`. Default `1.0` samples every trace. The full flag list with defaults is on the [configuration page](/installation/configuration/). ### Viewing spans Point `--tracing-endpoint` at any OTLP-gRPC-speaking collector — the OpenTelemetry Collector, Jaeger, Tempo, or a vendor agent — and view the spans in whatever backend that collector feeds. A reconcile span carries the snippet's namespace and name plus the spec generation it acted on (`jaas.generation`), so you can search for one snippet and see the latency breakdown across its fetch, eval, and publish phases. That is the fastest way to tell a slow upstream source fetch apart from a slow evaluation when a snippet's reconcile latency climbs. When a phase fails — source resolution, library resolution, evaluation, or publish — its span records the error and is marked with an error status, so the failed span shows up as `status=error` and is directly queryable in the tracing backend. Searching for error spans surfaces the exact phase a failing reconcile broke in without reading through the full trace. On a busy operator, drop `--tracing-sample-ratio` below `1.0` to keep only a fraction of traces — `0.1` samples one in ten. Leave it at `1.0` while diagnosing a specific problem so no trace is dropped. ## The Helm chart Tracing lives under `operator.tracing`. It only takes effect in operator mode (`operator.enabled: true`): ```yaml operator: enabled: true tracing: endpoint: otel-collector.observability.svc:4317 insecure: true sampleRatio: 1.0 ``` The keys map directly onto the flags: `endpoint` → `--tracing-endpoint`, `insecure` → `--tracing-insecure`, `sampleRatio` → `--tracing-sample-ratio`. Leaving `endpoint` empty (the default) keeps the SDK in no-op mode. --- # API reference Source: https://jaas.projects.metio.wtf/api/ The wire reference for the two custom resources in the `jaas.metio.wtf/v1` group and the Flux `ExternalArtifact` JaaS publishes. For task-oriented guidance, start from [Usage](/usage/). --- # ExternalArtifact output contract Source: https://jaas.projects.metio.wtf/api/externalartifact/ For every successfully evaluated `JsonnetSnippet`, the JaaS operator upserts a Flux `ExternalArtifact` CR (`source.toolkit.fluxcd.io/v1`) in the same namespace as the snippet. JaaS does not own the `ExternalArtifact` CRD — it is defined and installed by Flux's source-controller. The full CRD schema is in the [Flux ExternalArtifact reference](https://fluxcd.io/flux/components/source/externalartifacts/); below is the subset JaaS writes and the invariants downstream consumers can rely on. For task-oriented context, see [Operator mode](/usage/operator-mode/). ## What JaaS writes ### `spec.sourceRef` JaaS stamps a back-pointer to the originating snippet on `spec.sourceRef`: ```yaml spec: sourceRef: apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet name: ``` The namespace is always the snippet's own namespace — JaaS never publishes an `ExternalArtifact` to a different namespace. The three fields (`apiVersion`, `kind`, `name`) are wire-stable: downstream consumers that do producer-aware reverse lookup (such as stageset-controller's RFC-0012 resolution) match on this triple. Renaming any field is a breaking change. ### `status.artifact` After a successful publish, JaaS writes the following fields under `status.artifact`: | Field | Type | Description | |---|---|---| | `url` | string | HTTP URL of the published tarball. Revision-addressed: `///.tar.gz`. Byte-stable for the lifetime of the revision in the keep-set — re-publishing a different revision does not mutate the bytes at this URL. | | `path` | string | Storage-backend-relative path of the tarball. | | `revision` | string | `sha256:` content hash of the artifact. In `rendered` output mode this is the sha256 of the evaluated JSON; in `source` mode it is a deterministic hash over all source files (sorted by filename). | | `digest` | string | `sha256:` of the tarball bytes (the `.tar.gz` itself, not the content). Used by Flux consumers to verify integrity after download. | | `size` | int64 | Tarball size in bytes. | | `lastUpdateTime` | string | RFC3339 timestamp of the most recent successful publish. | ### `status.conditions` JaaS writes a single `Ready` condition on every successful publish: ```yaml status: conditions: - type: Ready status: "True" reason: Succeeded message: artifact published lastTransitionTime: observedGeneration: ``` `lastTransitionTime` is preserved across steady-state republishes (same `Ready=True`, new revision) so the timestamp does not churn. It advances only when the condition transitions (e.g. from `False` to `True` after a failure clears). ## What downstream consumers rely on **Gate on `Ready=True` before fetching.** Every Flux consumer — including `kustomize-controller`, `helm-controller`, and JaaS's own chained-snippet `sourceRef` resolver — treats an `ExternalArtifact` as not-yet-consumable until `status.conditions[Ready].status == "True"`. A snippet that has not yet completed its first successful reconcile will have no `Ready` condition (or `Ready=False`) and leaves chained snippets blocked with reason `SourceNotReady`. **URL is revision-addressed and byte-stable.** The URL published in `status.artifact.url` has the form `///.tar.gz`. The bytes at that URL are immutable for as long as the revision is in the snippet's keep-set (`spec.history`). Consumers can safely re-fetch a pinned URL (e.g. during rollback) and verify it against the recorded `digest`. Once a revision leaves the keep-set it is garbage-collected after the operator's GC grace period; a fetch after that point returns 404. **Revision identifies content, not time.** Two publishes that produce identical content (same evaluated JSON or same source files) yield the same `revision`. Consumers that cache by revision can skip a re-fetch when the revision has not changed. **The snippet mirrors `status.artifactURL`.** To avoid a second lookup, the originating `JsonnetSnippet` also carries the URL in its own `status.artifactURL`. `kubectl describe jsonnetsnippet` therefore surfaces the artifact location directly. ## Tarball contents The tarball layout depends on `spec.output` of the originating `JsonnetSnippet`: | `spec.output` | Tarball contents | |---|---| | `rendered` (default) | A single `rendered.json` holding the evaluated JSON output. | | `source` | Every source file from the resolved snippet source (inline `spec.files` or the files extracted from the `spec.sourceRef` tarball), with their original relative paths. | All tarballs are produced deterministically: entries are sorted by path and `ModTime` is zeroed. Two publishes from the same input produce byte-identical `.tar.gz` files and therefore the same `revision` and `digest`. --- # JsonnetLibrary Source: https://jaas.projects.metio.wtf/api/jsonnetlibrary/ `JsonnetLibrary` (`jlib`) is a namespaced bundle of `.libsonnet` files that `JsonnetSnippet` CRs in the same namespace can import. The library carries no artifact of its own and has no controller reconciling it today — it exists purely as a supply-side source for snippets. The import alias is set on the snippet side via `LibraryRef.importPath` (defaulting to the library's `metadata.name`); the library itself carries no registration name. Task-oriented guidance lives in [Jsonnet libraries](/usage/jsonnet-libraries/). ## Example ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetLibrary metadata: name: mylib namespace: default spec: files: main.libsonnet: | { dashboard(env, cluster):: { title: '%s / %s' % [env, cluster], }, } ``` Exactly one of `spec.files` or `spec.sourceRef` must be set. Admission rejects CRs that set neither or both. ## Spec fields `JsonnetLibrarySpec` embeds `SnippetSource` directly (the same source shape used by `JsonnetSnippetSpec`). | Field | Type | Default | Description | |---|---|---|---| | `files` | map[string]string | — | Inline map of filename to Jsonnet/libsonnet source. Exactly one of `files` or `sourceRef` must be set. | | `sourceRef.apiVersion` | string | `source.toolkit.fluxcd.io/v1` | APIVersion of the referenced Flux source CR. | | `sourceRef.kind` | string | — | Kind of the referenced source. One of: `GitRepository`, `OCIRepository`, `Bucket`, `ExternalArtifact`. Required when `sourceRef` is set. | | `sourceRef.name` | string | — | Name of the referenced source CR. Required when `sourceRef` is set. Minimum length 1. | | `sourceRef.namespace` | string | library's namespace | Namespace of the referenced source CR. Cross-namespace references are rejected when the operator is started with `--no-cross-namespace-refs`. | | `sourceRef.path` | string | — (artifact root) | Subdirectory within the fetched tarball to treat as the library root. Empty means the archive root — required for jb-vendored trees (e.g. a `sourceRef` pointing at a Flux `OCIRepository` for a JOI image) where the library aliases resolve against the full vendor tree. | ## Status `JsonnetLibrary` shares the `SyncStatus` type with `JsonnetSnippet`, though no controller currently populates it. The `status` subresource exists so a future library reconciler can be added without a schema change. | Field | Type | Description | |---|---|---| | `observedGeneration` | int64 | `.metadata.generation` last reconciled. Not currently populated. | | `conditions` | []Condition | Standard apimachinery conditions. Not currently populated. | | `revision` | string | Not currently populated. | | `artifactURL` | string | Not currently populated. | | `lastSyncTime` | Time | Not currently populated. | | `history` | []RevisionEntry | Not currently populated. | For how snippets reference libraries, see [/api/jsonnetsnippet/](/api/jsonnetsnippet/) (`spec.libraries`) and [Jsonnet libraries](/usage/jsonnet-libraries/). --- # JsonnetSnippet Source: https://jaas.projects.metio.wtf/api/jsonnetsnippet/ `JsonnetSnippet` (`jsnip`) is the published unit of Jsonnet evaluation. The JaaS operator watches these namespaced CRs, evaluates the Jsonnet they describe, and upserts a Flux `ExternalArtifact` whose `status.artifact.url` points at the rendered result. Task-oriented guidance lives in [Operator mode](/usage/operator-mode/) and [Snippet sources](/usage/snippet-sources/). ## Example ```yaml apiVersion: jaas.metio.wtf/v1 kind: JsonnetSnippet metadata: name: hello-world namespace: default spec: serviceAccountName: hello-world-tenant entryFile: main.jsonnet output: rendered history: 3 interval: 10m suspend: false files: main.jsonnet: | local lib = import 'mylib/main.libsonnet'; lib.dashboard(std.extVar('env'), std.extVar('cluster')) libraries: - kind: JsonnetLibrary name: mylib importPath: mylib externalVariables: env: production cluster: eu-west-1 tlas: title: - My Dashboard ``` Exactly one of `spec.files` or `spec.sourceRef` must be set. Admission rejects CRs that set neither or both. ## Spec fields | Field | Type | Default | Description | |---|---|---|---| | `serviceAccountName` | string | — | ServiceAccount the operator impersonates for every Kubernetes API call made on behalf of this snippet (source fetches, ExternalArtifact upserts). Must exist in the snippet's namespace. When empty, the operator's `--default-service-account` applies. Reconciliation is denied when neither is set (`ReasonServiceAccountMissing`). | | `entryFile` | string | `main.jsonnet` | File (relative to the resolved source root) that go-jsonnet evaluates. Restricted to `[A-Za-z0-9._/-]+` with no `..` segments. Maximum 255 characters. | | `files` | map[string]string | — | Inline map of filename to Jsonnet source. Exactly one of `files` or `sourceRef` must be set. | | `sourceRef.apiVersion` | string | `source.toolkit.fluxcd.io/v1` | APIVersion of the referenced Flux source CR. | | `sourceRef.kind` | string | — | Kind of the referenced source. One of: `GitRepository`, `OCIRepository`, `Bucket`, `ExternalArtifact`. Required when `sourceRef` is set. | | `sourceRef.name` | string | — | Name of the referenced source CR. Required when `sourceRef` is set. Minimum length 1. | | `sourceRef.namespace` | string | snippet's namespace | Namespace of the referenced source CR. Cross-namespace references are rejected when the operator is started with `--no-cross-namespace-refs`. | | `sourceRef.path` | string | — (artifact root) | Subdirectory within the fetched tarball to treat as the source root. Empty means the archive root. | | `libraries` | []LibraryRef | — | `JsonnetLibrary` CRs importable from this snippet. Libraries not listed here are invisible to the snippet even when present in the cluster. See [Jsonnet libraries](/usage/jsonnet-libraries/). | | `libraries[*].apiVersion` | string | `jaas.metio.wtf/v1` | APIVersion of the library CR. | | `libraries[*].kind` | string | — | Kind of the library CR. Currently only `JsonnetLibrary` is accepted. Required. | | `libraries[*].name` | string | — | Name of the referenced `JsonnetLibrary` CR. Required. Minimum length 1. | | `libraries[*].namespace` | string | snippet's namespace | Namespace of the referenced library CR. Cross-namespace references are rejected when `--no-cross-namespace-refs` is set. | | `libraries[*].importPath` | string | library's `metadata.name` | Alias used in `import` statements inside the snippet's Jsonnet source. Collisions with OCI-mounted shared library aliases are rejected at admission. | | `tlas` | `map[string][]string` | — | Top-level arguments passed to the snippet's outermost function. A single-element value becomes a string TLA; multiple values are passed as a JSON-encoded array, matching the HTTP query-parameter convention. | | `externalVariables` | map[string]string | — | Seeds `std.extVar` lookups for this snippet's evaluation. Keys that conflict with the operator's `--ext-var` set are rejected at admission; if admission is bypassed, the reconciler refuses the conflicting key with `ReasonExternalVariableConflict`. | | `output` | string | `rendered` | What bytes the published ExternalArtifact carries. `rendered`: the evaluated JSON (a single `rendered.json` in the tarball). `source`: the raw `.jsonnet`/`.libsonnet` files, for downstream consumers that re-evaluate themselves. | | `suspend` | bool | `false` | When `true`, the operator skips the evaluation pipeline, leaves the existing ExternalArtifact in place, and reports `Ready=False` with reason `Suspended`. Setting back to `false` resumes reconciliation. Mirrors Flux's `spec.suspend` convention. | | `history` | int32 | `1` | Number of past revisions retained in storage. Minimum 1, maximum 50. Setting to N > 1 lets downstream consumers pin to an older revision via its sha256 for rollback or blue-green flows. The keep-set is tracked in `status.history`. | | `interval` | Duration | — (watch-only) | Period between successful reconciles regardless of watch events. Picks up state outside the watched graph (environment drift, OCI library refreshes, etc.). Bounded at admission to between `30s` and `24h`. Failed reconciles use controller-runtime's exponential backoff; `interval` governs only the steady-state cadence. | ## Status `status` follows the `SyncStatus` shape shared by all JaaS CRs. | Field | Type | Description | |---|---|---| | `observedGeneration` | int64 | `.metadata.generation` of the spec the controller last reconciled. Lets clients distinguish stale status from up-to-date. | | `conditions` | []Condition | Standard apimachinery conditions. The `Ready` condition summarises whether the most recent reconcile succeeded; `reason` and `message` carry per-stage failure detail. See Ready condition reasons below. | | `revision` | string | `sha256:` content hash of the last successfully reconciled source. Empty until the first successful reconcile. | | `artifactURL` | string | HTTP URL of the last successfully published artifact tarball. Preserved across subsequent failures so the last-known-good URL stays observable. Empty until the first successful publish. | | `lastSyncTime` | Time | Timestamp of the most recent successful reconcile. | | `history` | []RevisionEntry | Most-recent N revisions retained in storage (`N` = `spec.history`). Index 0 is the most recent (matches `revision`). Each entry carries `revision` (sha256:hex) and `time` (publish time). | ### Ready condition reasons Every reason string is wire-stable — runbooks key off these values. | Reason | Status | Description | |---|---|---| | `Synced` | True | Most recent reconcile completed end-to-end and produced a publishable artifact. | | `Pending` | False | Snippet observed but not yet reconciled (transient). | | `Suspended` | False | `spec.suspend` is `true`; evaluation is paused. | | `InvalidSpec` | False | Spec-level validation failure (missing `main.jsonnet`, invalid source combination, etc.). | | `LibraryNotFound` | False | A `spec.libraries` entry references a `JsonnetLibrary` CR that cannot be found. | | `CrossNamespaceRefRejected` | False | `--no-cross-namespace-refs` is enabled and a library or source reference is outside the snippet's namespace. | | `ExternalVariableConflict` | False | `spec.externalVariables` names a key already owned by the operator's `--ext-var` set. | | `ServiceAccountMissing` | False | Neither `spec.serviceAccountName` nor `--default-service-account` is set. | | `EvaluationFailed` | False | go-jsonnet returned a diagnostic (syntax error, runtime error, etc.). | | `EvaluationTimeout` | False | The eval deadline fired before the snippet finished. | | `SourceNotReady` | False | The referenced Flux source CR exists but is not yet `Ready` or has no `status.artifact`. | | `SourceFetchFailed` | False | Fetching or verifying the source artifact failed (HTTP error, digest mismatch, tar corruption). | | `SourceRefNotYetSupported` | False | `spec.sourceRef` is set but the operator is running without `--enable-flux-integration`. Start the operator with that flag, or remove `spec.sourceRef` from the snippet. | | `DependencyCycle` | False | The snippet's dependency chain (via `spec.sourceRef` or `spec.libraries`) transitively points back at itself. | | `ArtifactTooLarge` | False | Rendered content exceeds the operator's `--max-artifact-bytes` limit. | | `RBACDenied` | False | An apiserver call failed with Forbidden, or the source CR's kind is not registered. Non-transient — backoff is disabled. The message names the verb and resource the cluster operator must grant. | A runbook page for each reason lives at `/runbooks//` on this site. See [Operator mode](/usage/operator-mode/) for lifecycle details and [ExternalArtifact output contract](/api/externalartifact/) for the artifact contract. --- # Comparisons Source: https://jaas.projects.metio.wtf/comparisons/ Where JaaS fits next to other tools that evaluate Jsonnet or render Kubernetes objects. These pages compare the rendering approaches; for how the rendered output is deployed, see the consuming controller's own documentation — for ordered, gated rollouts that is [stageset-controller](https://stageset.projects.metio.wtf/). --- # JaaS and grafana-operator Source: https://jaas.projects.metio.wtf/comparisons/grafana-operator/ JaaS and the [grafana-operator](https://grafana.github.io/grafana-operator/) are not alternatives — they do different jobs and are commonly used together. JaaS *produces* dashboard JSON from Jsonnet; grafana-operator *consumes* dashboard JSON and reconciles it into a Grafana instance. ## Division of labour grafana-operator manages Grafana itself. It reconciles `GrafanaDashboard`, `GrafanaDatasource`, `GrafanaFolder`, and related resources into one or more Grafana instances, handling authentication, folder placement, datasource wiring, and drift correction inside Grafana. A `GrafanaDashboard` can take its dashboard model from inline JSON, a URL, a ConfigMap, a `grafana.com` dashboard ID, or a remote source. JaaS evaluates Jsonnet — including [grafonnet](https://grafana.github.io/grafonnet/) — and publishes the rendered dashboard JSON as a Flux `ExternalArtifact`. It knows nothing about Grafana: it renders JSON and stops there. So the two compose along a clean seam. You author dashboards in grafonnet, JaaS renders them to JSON, and grafana-operator takes that JSON and reconciles it into Grafana. Each tool owns one half of the pipeline and neither reaches into the other's domain. ## When grafana-operator alone is enough If your dashboards are already plain JSON, or you consume them by `grafana.com` dashboard ID, or you maintain them in the Grafana UI and export the model, then grafana-operator covers the whole workflow on its own. There is no Jsonnet to render, so there is nothing for JaaS to do. Reach for grafana-operator by itself whenever the dashboard model exists as static JSON. ## When to add JaaS Add JaaS when your dashboards are authored in grafonnet (or any Jsonnet), typically to share panels, variables, and layout helpers across many dashboards instead of duplicating JSON. JaaS turns that Jsonnet into the JSON grafana-operator expects, with the same `jsonnet -J vendor` import resolution you use locally, so a dashboard renders identically on your workstation and in-cluster. grafana-operator then reconciles the rendered output as it would any other dashboard JSON. ## Wiring them together The grafana-operator project documents the JaaS integration directly, including the `GrafanaDashboard` configuration that points at a JaaS-rendered artifact: [grafana-operator dashboard example with JaaS](https://grafana.github.io/grafana-operator/docs/examples/dashboard/jaas/readme/). Keep all `GrafanaDashboard`, datasource, and folder configuration on the grafana-operator side; JaaS contributes only the rendering step and the `ExternalArtifact` it publishes. The [Grafana dashboards](/tutorials/grafana-dashboards/) tutorial shows the JaaS side — authoring a grafonnet dashboard as a `JsonnetSnippet` and publishing the rendered JSON. --- # JaaS vs jsonnet-controller Source: https://jaas.projects.metio.wtf/comparisons/jsonnet-controller/ [pelotech/jsonnet-controller](https://github.com/pelotech/jsonnet-controller) is a Flux-style controller that builds Jsonnet inside the controller and applies the result to the cluster, with a configuration model compatible with [kubecfg](https://github.com/kubecfg/kubecfg). JaaS and jsonnet-controller both turn Jsonnet into Kubernetes objects under Flux, but they draw the boundary between rendering and deployment in different places. ## Coupled build-and-apply vs a rendering service jsonnet-controller couples the two halves: one controller reads a Jsonnet source, evaluates it, and applies the resulting objects to the cluster. The rendered output lives inside the controller's reconcile loop; the unit of configuration is "build this Jsonnet and apply it here." JaaS separates them. The JaaS operator renders a `JsonnetSnippet` and publishes the result as a content-addressed Flux `ExternalArtifact` — a tarball any source-controller-speaking consumer can fetch. JaaS does not apply anything. A separate consumer (a Flux `Kustomization`, a `HelmRelease`, or a [stageset-controller](https://stageset.projects.metio.wtf/) `StageSet`) reads the artifact and applies it. The rendered bytes are a first-class, addressable object that more than one consumer can reference, pin to a revision, or roll back to. That same rendering is also reachable over HTTP, so callers that are not Flux consumers — a CI step, another service — can request a render from the same engine that produces the in-cluster artifacts. See [operator mode](/usage/operator-mode/) for how a snippet becomes an artifact. ## Where jsonnet-controller fits jsonnet-controller is the more direct choice when its model matches your needs: - **kubecfg compatibility.** If you already organise Jsonnet the kubecfg way — its import conventions, its top-level structure — jsonnet-controller consumes that directly without restructuring. - **One object per build-and-apply.** When you want a single Flux-style resource that both renders a source and applies it, with no intermediate artifact to manage, jsonnet-controller keeps the pipeline to one moving part. JaaS is the better fit when you want the rendered output to be an addressable, revisioned artifact that several consumers can share, when you want the same renderer available over HTTP to non-Flux callers, or when you want rendering and deployment owned by separate, independently-evolving controllers. ## The deployment-side comparison The comparison above is from the rendering angle — rendering as a service that produces an artifact, versus build-and-apply in one controller. For the deployment-side comparison against jsonnet-controller — ordered and gated apply, health gating between stages, rollback — see stageset-controller's own page: [stageset-controller vs jsonnet-controller](https://stageset.projects.metio.wtf/comparisons/jsonnet-controller/). --- # JaaS vs Tanka Source: https://jaas.projects.metio.wtf/comparisons/tanka/ [Grafana Tanka](https://tanka.dev) and JaaS both render Kubernetes manifests from Jsonnet, and both build on the same [`jsonnet-bundler`](https://github.com/jsonnet-bundler/jsonnet-bundler) vendoring conventions. The difference is *where the rendering runs and how the result reaches the cluster*. ## The two models Tanka renders and applies from a developer workstation or a CI runner. You organise your code as environments (`environments//main.jsonnet` plus a `spec.json`), run `tk show` or `tk export` to inspect the rendered objects, and `tk apply` to push them to the cluster the environment names in its `apiServer` field. The workstation or CI runner needs the Jsonnet toolchain, the vendored library tree, and credentials for the target cluster. JaaS renders in-cluster. A `JsonnetSnippet` names the same Jsonnet entry file; the JaaS operator evaluates it continuously and publishes the result as a Flux `ExternalArtifact`. A Flux `Kustomization` (or `HelmRelease`, or for ordered rollouts a [stageset-controller](https://stageset.projects.metio.wtf/) `StageSet`) consumes that artifact and applies it through the cluster's own GitOps pull loop. No workstation or CI runner holds cluster credentials, and developers do not need the Jsonnet toolchain or the vendor tree to ship a change. ## When Tanka is the better fit Tanka stays the stronger choice when its model matches the work: - **Ad-hoc and exploratory renders.** `tk show` / `tk diff` give an immediate, local preview of exactly what would be applied, with no operator, no `ExternalArtifact`, and no consumer to configure. - **Environments-as-code as the organising abstraction.** Tanka's environment/`spec.json` model — `namespace`, `injectLabels`, `apiServer` per environment, `tk env list` to enumerate them — is a first-class feature with no direct JaaS equivalent; JaaS pushes namespace and label concerns down to the consuming Flux `Kustomization` instead. - **Direct, imperative apply.** When a human running `tk apply` against a named cluster is the intended workflow — small teams, bootstrap steps, break-glass operations — a pull loop adds machinery you may not need. JaaS becomes the better fit when you want a pull-based GitOps loop, continuous reconciliation and drift correction, server-side rendering so laptops and CI hold no cluster credentials, per-tenant RBAC isolation on each render, and — through stageset-controller — ordered, gated progressive delivery. ## Your imports resolve identically JaaS's importer implements the **same resolution as `jsonnet -J vendor`**, the scheme Tanka uses. A bare `import 'foo/main.libsonnet'` finds the library by alias; an absolute `import 'github.com/.../gen/...'` resolves against the vendored tree; sibling and `../` relative imports resolve from the importing file. A `jb`-vendored tree (k8s-libsonnet, grafonnet, and the like) renders the same bytes through JaaS as it does under `tk show`. Migration is mostly about *where the files live*, not *rewriting Jsonnet*. See [Jsonnet libraries](/usage/jsonnet-libraries/) for how libraries reach a snippet. One behaviour to plan for: Tanka walks the evaluated object and extracts every nested `{apiVersion, kind, …}` into a resource stream. JaaS publishes exactly what the entry file evaluates to. Make the entry file emit a flat manifest stream — wrap resources in a `v1` `List`, or apply `std.objectValues(...)` over your Tanka object — so the consuming Flux `Kustomization` applies every resource. ## A migration path | Tanka | JaaS / Flux | |---|---| | `vendor/` (jb-installed libs) | `JsonnetLibrary` with a `sourceRef`, or OCI-mounted libraries | | `lib/` (project-shared libs) | `JsonnetLibrary` with inline `files` in the same namespace | | `environments//main.jsonnet` | one `JsonnetSnippet` (`spec.entryFile` + `spec.files`/`spec.sourceRef`) | | `import` resolution (`-J vendor`) | identical — JaaS's in-memory importer | | per-env `spec.json` / conditionals | `spec.externalVariables` (`std.extVar`) and `spec.tlas` (top-level args) | | `spec.json` `namespace` | Flux `Kustomization.spec.targetNamespace` | | `spec.json` `injectLabels` | Flux `Kustomization.spec.commonMetadata.labels` | | `tk show` / `tk export` | the JaaS operator, continuously → `ExternalArtifact` | | `tk apply` | Flux kustomize-/helm-controller (pull) | | `tk diff` | Flux drift detection; stageset verification between stages | | `tk env list` | `kubectl get jsonnetsnippets -A` | The conversion in three moves: 1. **Move the libraries.** Shared, versioned libraries (k8s-libsonnet, grafonnet) become a `JsonnetLibrary` backed by an `OCIRepository`, or a static OCI-mounted library on the operator. A project-local `lib/` becomes a `JsonnetLibrary` with inline `files`. The import alias is preserved either way. 2. **Turn each environment into a `JsonnetSnippet`.** `environments/team-a/main.jsonnet` becomes one snippet. Prefer `spec.sourceRef` (a Flux `GitRepository`/`OCIRepository`/`Bucket`) over inline `files` for real repositories, so Flux versions the source and JaaS re-renders on every commit. Per-environment differences move to `spec.externalVariables` and `spec.tlas`, so two environments sharing one library become two snippets that differ only in those fields. 3. **Replace apply with GitOps.** `tk apply` goes away. A Flux `Kustomization` points its `sourceRef` at the snippet's `ExternalArtifact`; Flux applies it and reconciles it continuously. The [Deploying manifests](/tutorials/deploying-manifests/) tutorial walks this end to end. ## What changes There is no single `tk diff` preview — use Flux's drift detection and stageset's between-stage verification instead. Namespace and label injection moves from `spec.json` to the consuming Flux `Kustomization`, where both are first-class fields. Your Jsonnet and libraries come across unchanged. --- # JaaS vs the jsonnet CLI Source: https://jaas.projects.metio.wtf/comparisons/jsonnet-cli/ The [`jsonnet`](https://jsonnet.org/) command-line tool — usually paired with [`jsonnet-bundler`](https://github.com/jsonnet-bundler/jsonnet-bundler) (`jb`) for vendoring libraries — evaluates Jsonnet to JSON on your machine. JaaS runs the **same go-jsonnet core** as a service. This is not a question of which implementation is correct; it is a question of *where the evaluation runs and what surrounds it*. ## What the service adds Over a local binary invocation, JaaS adds: - **An HTTP endpoint other systems can call.** `GET /jsonnet/` returns the evaluated JSON, with Top Level Arguments supplied as query parameters and external variables configured on the service. Anything that speaks HTTP can request a render without installing the toolchain or the vendor tree. See the [rendering endpoint](/usage/rendering-endpoint/) usage page. - **An operator that turns a snippet into a revisioned Flux artifact.** With `--enable-flux-integration`, a `JsonnetSnippet` is evaluated continuously and published as a content-addressed `ExternalArtifact` that Flux consumers apply in-cluster — re-rendered automatically when its source changes. See [operator mode](/usage/operator-mode/). - **Import resolution that matches `jsonnet -J vendor`.** JaaS resolves imports with the same semantics as the CLI's JPATH/vendor search, so the JSON a snippet produces under the service matches what the CLI produces locally. - **Evaluation caps.** `--evaluation-timeout` bounds wall-clock time per render, `--max-stack` bounds call-stack depth, and `--max-concurrent-evals` bounds how many evaluations run at once — so one expensive snippet cannot exhaust a shared server. The CLI imposes none of these on its own. - **Read-scope sandboxing.** Snippet-name resolution goes through Go's `os.Root`, which rejects `..` traversal and symlinks that escape the configured snippet directory, so a crafted request cannot read arbitrary files. The [evaluation and security](/usage/evaluation-and-security/) page details the caps and the boundaries. ## When the plain CLI is the right tool The CLI is the better choice for work that is local and one-off: - **One-off local renders** — inspecting what a snippet produces, debugging a library, iterating on a dashboard before committing it. - **CI scripts** — a build step that renders Jsonnet to JSON and hands it to another tool, where standing up a service would add a moving part for no gain. - **Anywhere a service is unwanted** — no HTTP endpoint to call, no cluster, no artifact to consume. Because JaaS runs the same go-jsonnet core, these are not mutually exclusive: you can keep `jsonnet` and `jb` on your workstation and in CI, and run JaaS in-cluster for the server-side and GitOps paths, with both producing the same JSON for the same input. The [local rendering](/tutorials/local-rendering/) tutorial shows JaaS used purely as a renderer, which keeps the local and in-cluster output aligned. --- # Runbooks Source: https://jaas.projects.metio.wtf/runbooks/ One page per Ready-condition `Reason` the operator sets, plus cross-cutting incident guides. Each page covers the symptom, the cause, how to diagnose it, and how to remediate. The operator automatically appends a link to the matching page on each actionable Ready-condition message — `(runbook: https://jaas.projects.metio.wtf/runbooks//)` — so `kubectl describe` points straight at the remediation page. Healthy reasons (`Synced`, `Suspended`, `Pending`) get no link. --- # ArtifactTooLarge Source: https://jaas.projects.metio.wtf/runbooks/artifacttoolarge/ ## Symptom `READY=False`, `REASON=ArtifactTooLarge`. The Message states the rendered byte count and the configured cap. ## Cause The snippet's rendered output exceeds the operator's `--max-artifact-bytes` (Helm: `operator.storage.maxArtifactBytes`). The cap is a defense-in-depth control — one runaway snippet shouldn't fill a shared storage volume. Common triggers: - a snippet generating massive arrays via `std.range(n)` with a much larger `n` than intended - accidentally inlining a large data fixture via `importstr` - forgetting to project / filter when fanning out per-tenant configs ## Diagnosis Check the rendered size locally: ```shell jsonnet /tmp/snippet/ | wc -c ``` ## Remediation Two paths: 1. **Shrink the output.** Inspect the snippet for unintended fan-out; project only the fields downstream consumers actually need. 2. **Raise the cap.** `--max-artifact-bytes=10485760` (10 MiB) gives more headroom. Pair with PVC sizing in the chart so the volume can hold N rev's worth of the new max. If many snippets are bumping against the cap, the cap itself may be too low for the workload — review the cluster-wide ratio of total storage to per-snippet rev count. --- # CRD watch engagement failing Source: https://jaas.projects.metio.wtf/runbooks/crd-watch-engagement/ Fires when `jaas_crd_watch_engagement_failures_total{gvk=...}` has increased above the per-hour threshold for the alert window. JaaS lazy-watches Flux source CRDs: at boot, only the CRDs already installed get a watch; when a previously-missing CRD becomes `Established=True` (operator installed source-controller post hoc, say), the `crdWatcher` engages a runtime watch on it via `Controller.Watch`. **When that engagement fails, the apiextensions informer fires no further events on a stable CRD** — meaning the watch stays un-engaged forever until either the CRD object's metadata/status is changed by something else, or the operator restarts. The visible symptom is that snippets with `spec.sourceRef.Kind=` stop re-rendering on upstream source updates. There is no per-snippet status signal — they sit at their last-rendered revision, drifting from upstream. ## Symptom - `JaaSCRDWatchEngagementFailing` alert is firing with `gvk` labelling the affected kind. - `kubectl describe jsonnetsnippet` on snippets referencing that GVK shows a Ready condition that hasn't moved in hours/days. - Upstream Flux source CRs (GitRepository, OCIRepository, Bucket, ExternalArtifact) show recent `status.artifact.revision` changes that the jaas snippets aren't picking up. ## Diagnosis ### Step 1 — confirm the CRD is actually installed and Established ```shell kubectl get crd .source.toolkit.fluxcd.io \ --output jsonpath='{.status.conditions[?(@.type=="Established")].status}{"\n"}' ``` Expect `True`. If the CRD is not installed or not yet Established, the watcher is correct to skip; install / wait. ### Step 2 — check the operator's RBAC on the source kind ```shell kubectl auth can-i list .source.toolkit.fluxcd.io \ --as=system:serviceaccount:: kubectl auth can-i watch .source.toolkit.fluxcd.io \ --as=system:serviceaccount:: ``` If either is "no", the chart's `operator-tenants` ClusterRole (or per-namespace RoleBinding when `watchNamespaces` is set) is missing the `get/list/watch` verbs on this kind. Update the chart's `FluxSourceKinds` mapping or add the verb manually. ### Step 3 — check controller-runtime cache state ```shell kubectl --namespace logs | grep -E 'engage|Failed to watch|cache' | tail -20 ``` Look for `cache reconnect`, `informer failed`, or `Watch failed: forbidden`. A transient cache reconnect during a heavy load period can trip engagement once; the DD7 bounded-retry mechanism re-engages automatically. Sustained failures point at RBAC or a misconfigured `MetricsBindAddress`. ## Remediation 1. **Fix the verb / kind / RBAC** issue identified above. 2. **Roll the operator pod** to force a fresh `SetupWithManager` pass, which re-detects every Flux CRD and re-engages watches that succeed on first try: ```shell kubectl --namespace rollout restart deployment ``` 3. **Verify** the counter stops increasing and the alert clears. ## When the alert is noisy If `jaas_crd_watch_engagement_failures_total` ticks once at boot but never again, that's the expected DD7 bounded-retry behavior: the first attempt failed (transient race during cache start), the retry succeeded. Raise `crdWatchEngagementFailuresPerHour` if the boot-time blip is noisy enough to page. --- # CrossNamespaceRefRejected Source: https://jaas.projects.metio.wtf/runbooks/crossnamespacerefrejected/ ## Symptom `READY=False`, `REASON=CrossNamespaceRefRejected`. The Message names the offending reference (a library or a sourceRef). ## Cause The operator is running with `--no-cross-namespace-refs=true` (the chart default) and the snippet references a library or Flux source in a different namespace. This is a deliberate isolation control — it mirrors Flux's `--no-cross-namespace-refs` and stops a tenant in namespace A from reaching libraries / sources in namespace B without an explicit relationship. ## Diagnosis Inspect the spec and identify which reference points outside the snippet's namespace: ```shell kubectl --namespace get jsonnetsnippet --output yaml | grep -E "namespace:|sourceRef:|libraries:" ``` ## Remediation Three options, by isolation strength: 1. **(recommended)** Duplicate the library / source CR into the snippet's namespace. 2. Promote the library to an OCI volume — mount via the chart's `additionalLibraries` map. Becomes part of the operator's filesystem, available to every snippet without a cross-namespace CR ref. 3. **(loosen isolation, cluster-wide)** Set `--no-cross-namespace-refs=false` on the operator. Affects every tenant in the cluster — only do this when tenants are mutually trusting. --- # DependencyCycle Source: https://jaas.projects.metio.wtf/runbooks/dependencycycle/ ## Symptom `READY=False`, `REASON=DependencyCycle`. The Message names the snippet that closes the cycle. ## Cause The snippet's `spec.sourceRef` chain transitively points back at the snippet itself. The reconciler detects this and refuses to publish so chained snippets don't loop forever (each republish would trigger every downstream snippet to re-render, which would re-trigger the upstream, and so on). Two cycle shapes: 1. **Direct sourceRef cycle:** `A.spec.sourceRef → ExternalArtifact/A`. A snippet sourcing from its own published artifact. 2. **Library-mediated cycle:** `A.spec.libraries → JsonnetLibrary/L`, where `L.spec.sourceRef → ExternalArtifact/A` (or a longer chain back to A). The validating webhook (`--enable-webhook`) rejects new CRs that introduce a cycle at admission; the reconciler check is a fallback for when admission is bypassed or the cycle is introduced retroactively (e.g., adding a new library that closes a loop with existing snippets). ## Diagnosis Walk the chain manually: ```shell kubectl get jsonnetsnippet --output jsonpath='{.spec.sourceRef}' && echo # Then inspect what that sourceRef points at, and what it sources from in turn. ``` For library-mediated cycles, the chain is: ```text snippet A.spec.libraries[i] → JsonnetLibrary L → L.spec.sourceRef → ExternalArtifact X → snippet that publishes X ``` If the publishing snippet at the end is A, you have a cycle. ## Remediation Break the cycle by removing the back-edge. Common fixes: - detach a library from its sourceRef (inline its files instead, if small) - have the upstream snippet publish a smaller artifact the downstream doesn't need to re-consume - restructure so the shared data lives in a static ConfigMap referenced as a sourceRef-equivalent, not in a snippet output --- # Eval-concurrency saturation Source: https://jaas.projects.metio.wtf/runbooks/eval-saturation/ Not tied to a single `Reason` — this page covers what to do when the global concurrent-eval cap (`--max-concurrent-evals`) is full and JaaS is shedding new evaluations. The cap exists because the synchronous go-jsonnet API has no context-aware cancellation: once an eval starts it runs to natural completion, so an unbounded queue lets a runaway snippet pile up goroutines that outlive every caller's deadline. ## Symptom One or more of: - `JaaSEvalSaturation` is firing — `jaas_eval_in_flight / jaas_eval_max_concurrent` has been above the threshold (default `0.9`) for the alert window. - `JaaSEvalRejected` is firing — `rate(jaas_eval_unavailable_total[5m])` has been above the threshold. - HTTP clients see `503 Service Unavailable` with body `{"error": "evaluation_unavailable", "message": "concurrent-eval cap is full; retry after backoff"}`. - `kubectl describe jsonnetsnippet` shows recurring `Warning EvalUnavailable` events with message `reconcile deferred for 1s by --max-concurrent-evals`. Ready condition stays untouched (backpressure is not failure). - `jaas_eval_outstanding_timed_out` is also elevated — confirms the runaway-snippet diagnosis: orphaned evals are pinning slots while their parents have already given up. ## Diagnosis: why is the cap full? The cap fills for two distinct reasons. The right remediation depends on which. ### Path A — runaway snippet (goroutines outliving their ctx) Read the leak gauge. If it's non-zero and trending up, evaluations are starting but not finishing — almost always a single snippet whose work dwarfs `--evaluation-timeout`. ```shell # Live count of evals whose parent reconcile already timed out: kubectl --namespace exec deploy/jaas -- \ wget -qO- http://localhost:8083/metrics | grep jaas_eval_outstanding_timed_out ``` To find the culprit, scan recent reconcile logs for `Jsonnet evaluation timed out` followed by repeated `EvalUnavailable` warnings on the same snippet: ```shell kubectl --namespace logs deploy/jaas --since=15m \ | grep -E 'EvaluationTimeout|EvalUnavailable' \ | sort | uniq -c | sort -rn | head ``` The snippet whose name dominates that list is the culprit. Common causes: - Deep recursion that takes seconds-to-minutes to complete naturally even after the parent deadline fires. - Pathological library import that triggers go-jsonnet's worst-case eval order. - A `std.foldl` over a generated array of millions of entries. ### Path B — genuine load above the cap Leak gauge is at zero (or steady, not growing), `jaas_eval_in_flight` is pegged near the cap, and many distinct snippets show `EvalUnavailable` events. The cap is sized too small for the workload. ```shell # Distribution of which snippets are seeing rejections — a flat # distribution across many snippets is path B; a single dominant # snippet is path A. kubectl --namespace exec deploy/jaas -- \ wget -qO- http://localhost:8083/metrics \ | grep jaas_snippet_eval_unavailable_total ``` ## Remediation ### Path A — runaway snippet 1. **Suspend the offender** to stop new evals while you fix the snippet: ```shell kubectl --namespace patch jsonnetsnippet --type merge \ --patch '{"spec":{"suspend":true}}' ``` 2. **Inspect the snippet** to understand the cost. Lower `--max-stack` is a blunt clamp that rejects pathological recursion before it can leak. The chart's `operator.maxStack` defaults to 500; pull it down to ~200 if the snippet doesn't legitimately need deeper recursion. 3. **Tighten `--evaluation-timeout`** if the snippet's natural completion time is the load-bearing factor. A 5s default lets a 60s pathological eval leak for nearly a minute; dropping to 1s shrinks the worst-case leak window. 4. **Re-enable** after the snippet spec is fixed: ```shell kubectl --namespace patch jsonnetsnippet --type merge \ --patch '{"spec":{"suspend":false}}' ``` ### Path B — genuine load 1. **Raise the cap** if the operator has CPU headroom. The default is `max(GOMAXPROCS*4, 16)`; double it via the chart: ```shell helm upgrade --reuse-values \ --set arguments.maxConcurrentEvals=64 ``` Each in-flight eval pins roughly one CPU when actively running, so the practical ceiling is bounded by node CPU. Past 2-3× GOMAXPROCS the gains drop sharply — more contention, same throughput. 2. **Tune the per-snippet rate limiter** if a small number of snippets dominate the request rate. `--rerender-rate` + `--rerender-burst` cap each snippet's reconcile frequency independent of the global eval cap. 3. **Scale horizontally** if a single replica can't keep up even at the raised cap. The chart's `replicas.max` controls the HPA ceiling; combined with the storage layer's leader election (S3 backend) you get multi-replica HA where every replica reads but only the lease-holder writes. ## When NOT to raise the cap If the leak gauge is non-zero AND growing, raising the cap lets more goroutines pile up before the next saturation event. Diagnose path A first. The cap is a backpressure boundary, not a throughput knob. ## Disable the gate (not recommended) `--max-concurrent-evals=0` disables the gate entirely. The leak gauge keeps working, but rejections never fire — a single runaway snippet can OOM the pod. Use only if you've sized the workload precisely and want to surface saturation purely via the leak gauge. --- # EvaluationFailed Source: https://jaas.projects.metio.wtf/runbooks/evaluationfailed/ ## Symptom `READY=False`, `REASON=EvaluationFailed`. The Message contains the raw go-jsonnet diagnostic — file name, line, column, and the underlying error. ## Cause The snippet failed to evaluate. Three broad categories: - **Syntax error** — unclosed brace, missing comma, bad indent. - **Runtime error** — `std.extVar('missing')` for an unset variable, division by zero, `error '...'` thrown explicitly. - **Import error** — `import 'missing.libsonnet'` resolves to nothing in the snippet's file map or library imports. ## Diagnosis Read the Message — it names the file and line. Reproduce locally: ```shell # Pull the snippet's files into a tempdir, then evaluate. kubectl get jsonnetsnippet --output json | jq -r '.spec.files["main.jsonnet"]' > /tmp/main.jsonnet jsonnet /tmp/main.jsonnet ``` For sourceRef-based snippets, fetch the tarball: ```shell SOURCE_URL=$(kubectl get gitrepository --output jsonpath='{.status.artifact.url}') curl -sL "$SOURCE_URL" | tar -xz -C /tmp/snippet jsonnet /tmp/snippet/ ``` ## Remediation Fix the snippet (or its libraries / source) and re-apply. The diagnostic message can leak the on-disk path of the snippet — fine in-cluster, worth gating behind a flag if exposed to untrusted callers in the future. --- # EvaluationTimeout Source: https://jaas.projects.metio.wtf/runbooks/evaluationtimeout/ ## Symptom `READY=False`, `REASON=EvaluationTimeout`. The snippet's eval ran longer than the operator's `--evaluation-timeout`. ## Cause Snippets are evaluated synchronously per reconcile. The deadline is wall-clock, not CPU — but go-jsonnet has no mid-evaluation cancellation, so a snippet that runs over the deadline still keeps consuming CPU on the operator pod until it returns naturally. Common triggers: - a snippet recursing deeper than necessary (try lowering `--max-stack` to surface this as a stack-limit error instead, then optimize) - a snippet that loads a huge sourceRef tarball and walks it - a snippet that calls `std.set` / `std.uniq` over a very large array ## Diagnosis Time it locally: ```shell time jsonnet /tmp/snippet/ ``` If it takes seconds locally, the operator's bound is too tight. If it takes minutes locally, the snippet itself is the problem. ## Remediation Two paths: 1. **Optimize the snippet.** Memoize repeated work into `local` bindings, narrow the input set, avoid `std.flattenDeepArrays` over deep trees. 2. **Raise the operator's bound.** `--evaluation-timeout=30s` (default `5s`) gives more headroom. Pair with `resources.cpu` headroom in the chart so the slow snippet doesn't starve other reconciles. For pathological inputs, consider splitting the snippet — render the slow part less often via a separate snippet others source from (see `examples/operator/chained-snippets.yaml`). --- # ExternalVariableConflict Source: https://jaas.projects.metio.wtf/runbooks/externalvariableconflict/ ## Symptom `READY=False`, `REASON=ExternalVariableConflict`. The Message names the conflicting key. ## Cause The snippet's `spec.externalVariables` declares a key that the operator already owns via `--ext-var` (cluster operator-level). Operator keys win by design — they're how the cluster admin pins cluster-scoped values like `cluster`, `region`, `environment` so a tenant snippet can't override them. ## Diagnosis ```shell # Which keys does the operator own? kubectl --namespace get pod --selector app.kubernetes.io/name=jaas --output yaml | grep -A1 "\--ext-var=" ``` Cross-reference with the snippet's `spec.externalVariables`. ## Remediation Rename the conflicting key in the snippet, or remove it from the snippet entirely (the operator-level value flows through automatically). If the snippet legitimately needs a different value, that's a structural problem — the snippet shouldn't ship with an opinion that overrides a cluster-wide invariant. Re-discuss with the cluster admin. The validating webhook (`--enable-webhook`) catches this at admission so `kubectl apply` rejects it before it lands. The reconciler enforcing the same rule is a fallback for when admission is bypassed. --- # High reconcile latency Source: https://jaas.projects.metio.wtf/runbooks/reconcile-latency/ Linked from the `JaaSReconcileLatencyHigh` alert. Fires when the controller-runtime `controller_runtime_reconcile_time_seconds` histogram p99 exceeds the configured threshold (default 30s) for the alert window. ## Symptom ```text ALERTS{alertname="JaaSReconcileLatencyHigh", controller="jsonnetsnippet"} ``` - `kubectl get jsonnetsnippet` shows status updates trickling in well after spec changes. - Operator pod CPU is moderate-to-high but the queue is draining (distinguishes this from [workqueue-saturation.md](workqueue-saturation.md), where the queue itself is growing). ## Cause Reconcile latency is the wall-clock cost of one `Reconcile()` call. Inside the call, JaaS does (in order): 1. `Get` the snippet from the cache. 2. Run the dependency-cycle BFS (one Get per touched node). 3. Resolve the source (inline files, or sourceRef → Fetcher: source-CR Get + tarball HTTP fetch + tar extract). 4. Resolve libraries (one Get per `LibraryRef`). 5. Evaluate the snippet via go-jsonnet. 6. Publish via the storage backend (`Put`). 7. Status update + ExternalArtifact upsert. Slow reconciles are almost always one of: - **Slow `Fetcher`** — a large tarball over a slow network, or a misbehaving source-controller (digest mismatch retries). - **Heavy jsonnet evaluation** — a snippet that imports lots of large libraries or runs unbounded recursion below the stack limit. - **Slow `Publisher`** — S3 throttling, a slow PVC, or large rendered output (close to `--max-artifact-bytes`). - **Cycle-detection blowup** — a dense graph of snippets cross-referencing via `sourceRef`. The BFS is O(V+E) but each visit is a `Get`. ## Diagnosis ```shell # Where is the time going? OTel spans break Reconcile into sub-stages. # Requires --tracing-endpoint set on the operator. kubectl --namespace get deploy jaas \ --output jsonpath='{.spec.template.spec.containers[0].args}' \ | tr ',' '\n' | grep tracing # Without tracing: the histograms expose enough to triangulate. kubectl --namespace port-forward svc/jaas-metrics 8083:8083 & curl -s localhost:8083/metrics | grep -E 'reconcile_time|rendered_bytes' ``` The `jaas_snippet_rendered_bytes` histogram tells you whether a slow Publisher is the cause (large outputs) vs. a slow Fetcher (small outputs but the histogram is dominated by upstream IO). For a single suspect snippet, force a reconcile under load and observe: ```shell kubectl annotate jsonnetsnippet / \ jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite kubectl --namespace logs deploy/jaas --tail=50 | grep ``` ## Remediation - **Slow Fetcher.** Narrow `spec.sourceRef.path` to the subdirectory the snippet actually needs. Tarballs balloon when an entire monorepo is published; the filter trims what JaaS has to download. - **Heavy eval.** Cap `--max-stack` to bound runaway recursion. Profile the snippet locally via `jsonnet` (the CLI) — the operator's evaluation is identical. - **Slow Publisher.** See [storage-recovery.md](storage-recovery.md) for backend-specific tuning. - **Cycle-detection blowup.** Reorganize snippets so the cross-reference graph is shallow; cycle detection visits every reachable node, so a fan-out of N snippets multiplies the cost. - **OTel for forensics.** Enable `--tracing-endpoint` and the per-stage spans turn this from guessing into measurement. The chart values key is `operator.tracing.endpoint`. ## Prevention - Pair the alert with `JaaSSnippetArtifactGrowing` ([artifacttoolarge.md](artifacttoolarge.md)). A snippet whose rendered bytes are climbing is almost always headed for a latency spike too. - For multi-replica HA, leader election keeps only one replica in the reconcile loop — sustained latency on the lease-holder is what matters; standby latency is not measured. --- # InvalidSpec Source: https://jaas.projects.metio.wtf/runbooks/invalidspec/ ## Symptom `READY=False`, `REASON=InvalidSpec`. The condition Message names which field is at fault. ## Cause Spec-level validation that admission should have caught but the reconciler is enforcing as a fallback: - `spec.entryFile` is empty - both `spec.files` and `spec.sourceRef` are set (mutually exclusive) - neither `spec.files` nor `spec.sourceRef` is set - `spec.entryFile` does not match any key in the resolved file map ## Diagnosis ```shell kubectl describe jsonnetsnippet ``` Read the Message — it names the field. ## Remediation Fix the spec and reapply. If the validating webhook is enabled (`--enable-webhook`), `kubectl apply` rejects the invalid spec at admission instead of letting it land and fail later. If you're seeing `InvalidSpec` on apply through the webhook, that's a bug — file an issue with the rejected manifest. --- # LibraryNotFound Source: https://jaas.projects.metio.wtf/runbooks/librarynotfound/ ## Symptom `READY=False`, `REASON=LibraryNotFound`. The Message names the missing library. ## Cause A `spec.libraries[*]` entry references a `JsonnetLibrary` CR that the operator cannot Get. Common reasons: - the library CR doesn't exist (typo, wrong namespace, not yet applied) - the tenant ServiceAccount doesn't have `get` on the library kind in the library's namespace - the library is in a different namespace and `--no-cross-namespace-refs=true` ## Diagnosis ```shell # Confirm the library exists. kubectl --namespace get jsonnetlibrary # Test the tenant's RBAC. kubectl auth can-i get jsonnetlibrary \ --as=system:serviceaccount:: -n ``` If `can-i` returns `no`, RBAC is the gap. ## Remediation - create the library CR if it doesn't exist - grant `get` on `jsonnetlibraries.jaas.metio.wtf` to the tenant SA via a Role + RoleBinding - if cross-namespace is intended, either move the library into the snippet's namespace or set `--no-cross-namespace-refs=false` cluster-wide --- # Operator pod not ready Source: https://jaas.projects.metio.wtf/runbooks/operator-pod-down/ Linked from the `JaaSOperatorPodDown` alert. Fires when at least one jaas pod has been `Ready=False` for the alert window (default 5m). ## Symptom ```text ALERTS{alertname="JaaSOperatorPodDown", namespace=""} ``` - New JsonnetSnippets stay `Ready=Unknown` indefinitely (no controller reconciling them). - Existing snippets keep serving stale `ExternalArtifact` content (the storage HTTP server may still respond; reconciliation is the part that stopped). ## Cause One of the chart's two probes is failing: - **Liveness** (`/live`) is unconditional 200 — a failure here means the pod's HTTP server itself isn't responding (deadlock, OOM, panic). - **Readiness** (`/ready`) consults `HealthState`, which goes `false` during `drainBeforeShutdown` or before the listeners bind. - **Startup** (`/start`) returns 503 until `MarkStarted()` is called — bind failures (port already in use, permission denied) keep the pod stuck here forever. Frequent causes: 1. **Bind failure** on one of the HTTP servers (jsonnet, management, storage, metrics, webhook). The pod logs a clear "listen tcp: address already in use" or similar at boot. 2. **OOMKilled** — a pathological snippet allocated a huge object; the kubelet killed the pod. `kubectl describe pod` shows `Last State: Terminated, Reason: OOMKilled`. 3. **Image pull failure** — registry rate limit, wrong tag, missing pull secret. 4. **TLS cert missing or unreadable** when `operator.webhook.enabled=true` and the cert-manager Secret hasn't materialized. 5. **Lease contention** that leaves no replica as leader (every replica reconnecting to renew, never holding the lease). ## Diagnosis ```shell # Which probes are failing? Events tell you. kubectl --namespace describe pod --selector app.kubernetes.io/name=jaas # Pod logs — the boot sequence prints every listener it binds. kubectl --namespace logs --selector app.kubernetes.io/name=jaas --tail=300 # Compare against the expected listener set. kubectl --namespace get svc --selector app.kubernetes.io/name=jaas ``` For OOM: ```shell kubectl --namespace top pod --selector app.kubernetes.io/name=jaas kubectl --namespace get pod --selector app.kubernetes.io/name=jaas --output yaml \ | grep -A3 lastState ``` For lease problems (multi-replica only): ```shell kubectl --namespace get lease -operator --output yaml ``` `holderIdentity` flipping every renewal interval is a sign of network flake or apiserver pressure — the replicas can't keep the lease stable. ## Remediation - **Bind failure.** Free the colliding port (often `8080`, when the controller-runtime metrics endpoint defaults conflict with the jsonnet HTTP port — confirm `--metrics-bind-address` is `:8083`). - **OOMKilled.** Raise `resources.memory`, then identify the runaway snippet (the bench in `internal/operator/bench_test.go` is a regression baseline; the runaway is usually obvious from `jaas_snippet_rendered_bytes`). - **Image pull.** Standard k8s drill: check secrets, registry, tag. - **TLS cert.** With `certMode=cert-manager`, confirm the Issuer / Certificate are ready. With `certMode=self-signed`, the operator regenerates on boot — a permission error on the cert-dir mount blocks it. - **Lease flap.** Try `kubectl --namespace delete lease -operator` to force a fresh election. If it keeps flapping, the cluster has bigger problems than JaaS. ## Prevention - Pin `replicas.max: 1` and `LeaderElectionReleaseOnCancel: true` (chart defaults). Multi-replica is only worth it for storage-backed HA — single-replica is the simpler operational story. - Run the cleanup Job (`operator.cleanupOnDelete.enabled: true`, chart default) so a `helm uninstall` of a wedged operator unwinds the finalizers instead of leaving orphaned snippets. - Pair this alert with `JaaSControllerWorkqueueDepthHigh` ([workqueue-saturation.md](workqueue-saturation.md)). A pod-down event almost always coincides with a saturated queue from snippets piling up. --- # Pending Source: https://jaas.projects.metio.wtf/runbooks/pending/ ## Symptom `kubectl get jsonnetsnippet` shows `READY=Unknown` (or `False` with `REASON=Pending`) immediately after the snippet was created or its spec was updated. ## Cause The operator has observed the CR but hasn't completed its first reconcile pass yet. Transient by design. ## Diagnosis ```shell kubectl describe jsonnetsnippet ``` If the timestamp on the `Pending` condition is older than ~30 seconds, the operator is either: - not running (`kubectl --namespace get pods`) - backed up on its work queue (check `kubectl logs deploy/jaas` and the `workqueue_depth` metric) - not the leader (multi-replica install, `kubectl --namespace get lease` shows the holder) ## Remediation If transient, wait. If persistent: - restart the operator: `kubectl rollout restart deploy/jaas` - inspect the operator's logs for errors If the snippet is stuck in `Pending` because the work queue is saturated, increase replicas (with leader election ON) or raise the rate-limiter budget (`--rerender-rate`, `--rerender-burst`). --- # RBACDenied Source: https://jaas.projects.metio.wtf/runbooks/rbacdenied/ ## Symptom ```text kubectl describe jsonnetsnippet ... Status: Conditions: Reason: RBACDenied Status: False Type: Ready Message: RBAC denied reading the source CR — grant the tenant ServiceAccount get on the source kind ... ``` Or for a missing CRD: ```text Message: source CR's kind is not registered with the apiserver — install the corresponding CRD ... ``` The reconciler logs at warn level and stops engaging backoff for this snippet. The next reconcile happens only when the snippet's spec changes, a referenced library / source CR's status flips, or `spec.interval` ticks — so the workqueue isn't burning cycles on a permanently-failing call. ## Cause The apiserver returned `Forbidden` on a call the reconciler had to make. Three call sites can surface this: 1. **Source-CR read.** The tenant ServiceAccount lacks `get` on the kind named by `spec.sourceRef.kind`. The fix is on the tenant's `Role` / `RoleBinding`. 2. **Library-CR read.** The tenant SA lacks `get` (and typically `list`) on `jsonnetlibraries` in the snippet's namespace. 3. **ExternalArtifact write.** The tenant SA lacks `create` / `update` / `patch` on `externalartifacts`. This is the publish step — the rendered bytes are computed but the operator can't write them back as the impersonating client. The `NoMatchError` variant means the apiserver doesn't know about the resource kind at all — typically because the corresponding CRD (usually Flux's source-controller) isn't installed in the cluster. ## Diagnosis `kubectl describe` shows the operator's classified message. The verbatim apiserver error (`forbidden: ServiceAccount X cannot get resource Y in namespace Z`) is appended after the operator's classification, so you can read off: - Which SA tried the call (`system:serviceaccount::`) - Which verb it lacked (`cannot get`, `cannot create`, `cannot patch`) - Which resource (`gitrepositories.source.toolkit.fluxcd.io`, `jsonnetlibraries.jaas.metio.wtf`, `externalartifacts.source.toolkit.fluxcd.io`) Verify the SA exists and inspect its current permissions: ```shell kubectl --namespace get sa kubectl auth can-i --as=system:serviceaccount:: \ --namespace \ ``` For the `NoMatchError` variant: ```shell # Verify the CRD is actually installed: kubectl get crd | grep -E 'source.toolkit.fluxcd.io|jaas.metio.wtf' # If source-controller's CRDs are missing, install Flux: # https://fluxcd.io/flux/installation/ ``` ## Remediation Grant the missing verb to the tenant SA. The minimum verbs JaaS expects are documented in the [Tenancy and RBAC](https://jaas.projects.metio.wtf/usage/tenancy-and-rbac/#the-tenant-role) guide. Typical fix: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: name: jaas-tenant rules: - apiGroups: [source.toolkit.fluxcd.io] resources: [externalartifacts] verbs: [get, create, update, patch] - apiGroups: [source.toolkit.fluxcd.io] resources: [gitrepositories, ocirepositories, buckets, externalartifacts] verbs: [get] - apiGroups: [jaas.metio.wtf] resources: [jsonnetlibraries] verbs: [get, list] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: namespace: name: jaas-tenant subjects: - kind: ServiceAccount name: namespace: roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: jaas-tenant ``` After the RBAC change, force the next reconcile (the snippet's last spec edit doesn't auto-retrigger because the failure was non-transient): ```shell kubectl annotate jsonnetsnippet jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite ``` For the missing-CRD case, installing the CRD fires the operator's `crdWatcher`, which engages the watch automatically — no manual nudge needed. ## Why this is non-transient `Forbidden` doesn't recover by retry. The cluster operator (or whoever owns the tenant's RBAC) has to grant the verb. Retrying every 16 minutes would pile up wasted API calls and obscure the workqueue's signal. The non-transient classification lets the workqueue depth metric remain meaningful — anything on it is genuinely live work. `NoMatchError` is the same shape: until the CRD is installed, the kind doesn't exist. Retry can't conjure it. --- # Self-signed webhook cert renewal failing Source: https://jaas.projects.metio.wtf/runbooks/webhook-cert-renewal/ Fires when `jaas_webhook_cert_renewal_failures_total` has increased above the configured per-hour threshold. The `Renewer` background goroutine rotates the self-signed TLS material every `Validity / 3` (typically every few months for a year-long cert). When it can't, the existing cert keeps working until its natural expiry — at which point the apiserver stops trusting the chain and **every JsonnetSnippet admission fails cluster-wide with `x509` errors**. ## Symptom - `JaaSWebhookCertRenewalFailing` alert is firing (severity: critical). - Operator pod logs carry repeated `Self-signed webhook cert renewal failed` warnings at the `Renewer.Interval` cadence. - `kubectl describe validatingwebhookconfiguration ` shows a `caBundle` that hasn't rotated since the failures started. - The pod stays `Ready=True` — the renewer's failures don't gate the readiness probe. ## Diagnosis The most common causes, in order of frequency: ### Cause A — RBAC drift on the named VWC The operator's ClusterRole pins `resourceNames: []` on the `validatingwebhookconfigurations` patch verb. A chart upgrade that changes `operator.webhook.vwcName` (or a manual chart edit) leaves the running pod patching a name it no longer has permission for. ```shell kubectl auth can-i patch validatingwebhookconfiguration/ \ --as=system:serviceaccount:: ``` If the answer is "no", the chart's `operator-cluster` ClusterRole needs the current VWC name added to `resourceNames` (or the running pod restarted to pick up the new name). ### Cause B — VWC renamed out from under the operator A separate controller (admission policy automation, GitOps drift correction) renamed the VWC. The operator is patching a stale name. ```shell kubectl get validatingwebhookconfigurations \ --selector 'app.kubernetes.io/instance=' ``` If the live name differs from the operator's `--webhook-validating-config-name` flag, redeploy the operator with the correct flag or rename the VWC back. ### Cause C — `CertDir` gone read-only The chart mounts `CertDir` as an `emptyDir` by default. A `kubectl apply` that adds a `readOnlyRootFilesystem: true` security context or a sidecar that re-mounts the volume can break writes. ```shell kubectl --namespace exec -- ls -l /tmp/k8s-webhook-server/serving-certs/ kubectl --namespace exec -- touch /tmp/k8s-webhook-server/serving-certs/.write-probe ``` If the touch fails, the security-context or volume mount needs fixing. ## Remediation 1. **Fix the root cause** (RBAC, name, or mount). 2. **Roll the operator pod** to force a fresh bootstrap of the cert and a re-patch of the VWC: ```shell kubectl --namespace rollout restart deployment ``` The new pod's bootstrap path goes through the dual-CA union (DD8), so existing replicas stay trusted across the rotation. 3. **Verify renewal is healthy** after the bounce — the `jaas_webhook_cert_renewal_failures_total` counter should stop increasing, and the alert clears once the `for:` window passes. ## When to consider switching to cert-manager If the self-signed renewer keeps tripping over your environment's RBAC story or pod-security policies, the chart supports `operator.webhook.certMode: cert-manager` — cert-manager handles the rotation and the operator mounts the resulting secret. Trade-off: requires cert-manager installed and an Issuer configured. --- # ServiceAccountMissing Source: https://jaas.projects.metio.wtf/runbooks/serviceaccountmissing/ ## Symptom `READY=False`, `REASON=ServiceAccountMissing`. ## Cause The snippet omitted `spec.serviceAccountName` AND the operator was started without `--default-service-account`. The operator refuses to reconcile a snippet with no effective ServiceAccount because every reconcile mints a tenant token from that SA — without one, there's nothing to impersonate. ## Diagnosis ```shell kubectl get jsonnetsnippet --output jsonpath='{.spec.serviceAccountName}' ``` Empty? Either the snippet must set it, or the cluster operator must configure a default. ## Remediation Pick one: 1. **Snippet-side (preferred for multi-tenant setups):** set `spec.serviceAccountName: ` on every snippet. Each tenant uses its own SA → least-privilege impersonation. 2. **Cluster-side (single-tenant clusters):** start the operator with `--default-service-account=`. Every snippet without an explicit SA impersonates this one. The default SA must exist in **every snippet's namespace** — the operator looks it up per-reconcile. --- # SourceFetchFailed Source: https://jaas.projects.metio.wtf/runbooks/sourcefetchfailed/ ## Symptom `READY=False`, `REASON=SourceFetchFailed`. The Message describes what went wrong (HTTP error, digest mismatch, tarball too large, etc.). ## Cause The Fetcher resolved the source CR and started downloading the artifact, but the download itself failed. Three subcategories: - **HTTP failure** — connection refused, 5xx from the source-controller endpoint, TLS handshake error - **Digest mismatch** — the bytes don't hash to `status.artifact.digest`. Possible truncation or in-flight tampering - **Tarball oversized** — extracted bytes exceed `MaxArchiveBytes` (default 64 MiB) ## Diagnosis Check the source CR's `status.artifact.url` is reachable from the operator pod: ```shell kubectl exec deploy/jaas -- wget -O- | wc -c ``` A connection refused means the storage endpoint of source-controller (or another publisher) is unreachable — usually a NetworkPolicy issue. For digest mismatches, the source CR has likely been republished mid-fetch — the next reconcile typically succeeds. For oversized tarballs, the snippet's `spec.sourceRef.path` filter is too broad — narrow it so only the files the snippet actually `import`s come through. ## Remediation - **Network**: fix the NetworkPolicy / DNS / TLS that's blocking the fetch - **Digest**: re-reconcile (manual: `kubectl annotate jsonnetsnippet jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite`) - **Oversized**: narrow `spec.sourceRef.path` to the subdirectory the snippet needs, or split the source repo --- # SourceNotReady Source: https://jaas.projects.metio.wtf/runbooks/sourcenotready/ ## Symptom `READY=False`, `REASON=SourceNotReady`. The Message names the source CR (`GitRepository/foo`, `ExternalArtifact/bar`, etc.). ## Cause The Flux source CR the snippet references exists but its own `status.conditions[Ready]` is not yet True (or `status.artifact` is empty). The operator refuses to fetch from a source it can't trust as ready. For chained snippets specifically: the upstream snippet may have failed reconciliation, so its ExternalArtifact is stale or unpopulated. ## Diagnosis ```shell kubectl describe # Look for the Ready condition and any error messages. ``` For Flux sources, also check the source-controller logs: ```shell kubectl --namespace flux-system logs deploy/source-controller | grep ``` ## Remediation Fix the upstream source. The operator watches Flux source kinds and will re-reconcile the snippet automatically when the source flips to Ready=True — no manual nudge required. --- # SourceRefNotYetSupported Source: https://jaas.projects.metio.wtf/runbooks/sourcerefnotyetsupported/ ## Symptom `READY=False`, `REASON=SourceRefNotYetSupported`. ## Cause The snippet sets `spec.sourceRef` but the operator was built without a Fetcher wired in. This is a mis-deployment in practice — production binary always wires `sources.New()`. Seeing this in a real cluster means you're running: - a test/dev binary - a custom build where `defaultBuilder` was modified - a future code path that hasn't enabled sourceRef yet ## Diagnosis ```shell kubectl logs deploy/jaas | grep -i "fetcher" ``` If the operator logs no Fetcher initialization, the binary is incomplete. ## Remediation Use a release binary, or convert the snippet to `spec.files` inline as a temporary workaround. --- # Storage backend recovery Source: https://jaas.projects.metio.wtf/runbooks/storage-recovery/ Not tied to a single `Reason` — this page covers what to do when the artifact store itself is degraded (PVC lost, S3 endpoint unavailable, the storage HTTP server is down). Downstream Flux consumers (kustomize-controller, helm-controller, grafana-operator) dereference `ExternalArtifact.status.artifact.url` to fetch tarballs; when that URL stops returning bytes, dependent resources stall. ## Symptom One or more of: - Downstream Flux consumers report `404 Not Found` or `connection refused` against the JaaS storage URL. - `kubectl get externalartifact --all-namespaces` shows resources whose URL is unreachable from the consumer pods. - The operator pod is healthy (Ready=True on snippets), but the storage Service is unresponsive. - `helm upgrade` of the chart from `persistence.enabled: false` to `true` — or vice versa — caused a gap. ## Triage: which backend are you running? ```shell kubectl --namespace get deploy jaas \ --output jsonpath='{.spec.template.spec.containers[0].args}' \ | tr ',' '\n' | grep -E 'storage-backend|storage-path|s3-endpoint' ``` - `--storage-backend=local` → filesystem behind `--storage-path`. Either an emptyDir (chart default) or a PVC. - `--storage-backend=s3` → an external S3-compatible bucket; the storage HTTP server in-pod is a thin streaming proxy over `minio-go`. ## Filesystem backend ### PVC lost or replaced Symptom: every `ExternalArtifact` URL returns 404 even though the snippet's Ready=True. The Publisher writes idempotently on every reconcile, so making the operator re-render every snippet is the fix: ```shell # Roll the operator — the cache is rebuilt from the apiserver and every # snippet is reconciled. Each reconcile re-runs the Publisher, which # writes the tarball back to disk. With a clean PVC, the gap closes in # one reconcile loop. kubectl --namespace rollout restart deploy/jaas ``` If reconciles do not produce tarballs again, force a reconcile per snippet: ```shell kubectl annotate --all-namespaces jsonnetsnippet --all \ jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite ``` The window between PVC loss and the first re-render is the only outage downstream consumers see. With `replicas.max: 1` (chart default) that window is bounded by the rollout time; with multi-replica HA + RWX PVC, the lease-holder writes immediately and the gap is sub-second. ### emptyDir reset (pod restart) `persistence.enabled: false` is fine for low-stakes deployments but every pod restart re-renders every snippet. The "fix" is to enable persistence: ```shell helm --namespace upgrade jaas oci://ghcr.io/metio/helm-charts/jaas \ --reuse-values \ --set operator.storage.persistence.enabled=true \ --set operator.storage.persistence.size=10Gi ``` After the upgrade, follow the PVC-lost steps above to repopulate the new volume. ### Storage HTTP server unreachable but operator healthy Diagnose: ```shell kubectl --namespace port-forward svc/jaas-storage :8082 & curl -fsSL http://localhost:///.tar.gz | wc -c ``` If port-forward works but in-cluster fetches fail, look at NetworkPolicy: ```shell kubectl get networkpolicy --all-namespaces | grep -i jaas ``` The chart's optional NetworkPolicy locks the storage port to a single source-controller selector. If your Flux install lives elsewhere or carries different labels, the NetworkPolicy will silently drop the traffic. Either widen `networkPolicy.fromSourceControllerSelector` or disable the NetworkPolicy on this chart and rely on a cluster-wide policy. ## S3 backend ### Endpoint unreachable / 5xx from the provider The pod-side storage HTTP server is a proxy. When the upstream S3 endpoint is down, the proxy returns 502/504 and downstream Flux consumers retry with backoff. Operator pod health is unaffected. Diagnose: ```shell # Pull a recent tarball directly to confirm it's the upstream kubectl --namespace exec deploy/jaas -- \ wget -O- http://localhost:8082///.tar.gz | wc -c ``` If the in-pod fetch fails too, check the operator logs for `minio-go` errors. Auth problems (expired session token, rotated access key) show up here distinctly from network problems. ### Bucket gone or wrong prefix If the bucket was emptied or the `--s3-prefix` changed, the proxy returns 404 even though the snippet is Ready. Re-render every snippet to repopulate: ```shell kubectl annotate --all-namespaces jsonnetsnippet --all \ jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite ``` The Publisher writes idempotently — running this against a working bucket is safe. ### Credentials rotated With static credentials (`--s3-access-key` / `--s3-secret-key` / inline chart values), a rotation requires a Deployment restart for `minio-go` to pick up the new values. With IAM/IRSA, the discovery chain re-reads at request time — the operator picks the new identity up automatically. Force the new keys to take effect: ```shell kubectl --namespace rollout restart deploy/jaas ``` ## Disk full (`ENOSPC`) When the volume backing `--storage-path` fills up, `Store.Put` returns the kernel's `ENOSPC` verbatim → `Publisher.Publish` wraps it → no specific sentinel matches → classified as transient `ReasonSourceFetchFailed` → controller-runtime backoff retries forever at the ~16 min cap. The operator pod stays healthy; every snippet using local storage starts looping. ### Symptom - Multiple snippets simultaneously flip to `Ready=False` with messages mentioning `no space left on device`. - `JaaSControllerWorkqueueDepthHigh` alert fires (the backoff queue saturates). - `kubectl --namespace exec deploy/jaas -- df -h /var/lib/jaas/artifacts` shows the volume at 100%. ### Recovery 1. **Free space.** Either resize the PVC (if `operator.storage.persistence.enabled: true`), increase `operator.storage.sizeLimit` (emptyDir), or prune retained revisions: ```shell # Lower spec.history on noisy snippets so the next reconcile prunes # older revisions. The Publisher's Prune step removes everything # outside the keep-set, freeing space proportional to the change. kubectl patch jsonnetsnippet --type=merge --patch '{"spec":{"history":1}}' ``` For an immediate flush, force-prune by removing the artifact directory of a snippet you're certain doesn't need its history: ```shell kubectl --namespace exec deploy/jaas -- \ rm -rf /var/lib/jaas/artifacts// ``` 2. **Drive reconciliation.** The backoff cap is ~16 min, so failing snippets retry within that window automatically. To force immediate re-render: ```shell # Annotate every snippet that flipped to Ready=False: kubectl get jsonnetsnippets --all-namespaces \ --output jsonpath='{range .items[?(@.status.conditions[?(@.type=="Ready")].status=="False")]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}' \ | xargs -n1 -I {} sh -c 'ns=$(echo "{}" | cut -d/ -f1); n=$(echo "{}" | cut -d/ -f2); \ kubectl --namespace "$ns" annotate jsonnetsnippet "$n" \ jaas.metio.wtf/reconcile-at=$(date -u +%FT%TZ) --overwrite' ``` 3. **Prevention.** Set `operator.storage.maxArtifactBytes` to cap individual snippet renders before they hit the disk. Use `JaaSSnippetArtifactGrowing` (opt-in PrometheusRule alert) to catch creep before saturation. ## OOM during render (kubelet kills the operator pod mid-publish) A snippet that evaluates into a multi-MB JSON tree can push the operator past its `resources.memory` limit. The kubelet SIGKILLs the pod. Effects: - The pod restarts cleanly. Probes flip back to Ready within seconds. - The leader-election lease is dropped on the killed pod's process death; the next replica picks it up immediately (or, on single-replica installs, the new pod re-acquires). - **Mid-write `.tmp` files** in the local store are orphaned because the `Store.Put` was interrupted between `Create` and the atomic `Rename`. The background `Sweep` (default every 10 min, configurable via `operator.storage.sweep.interval`) cleans them up once their `ModTime` falls outside the `maxTmpAge` window. - **The snippet that triggered the OOM** keeps failing in the same way on the next reconcile because its rendered output is what blew memory in the first place. The pod loop-restarts until either the snippet is fixed or its memory cost falls below the limit. ### Symptom - Operator pod restarts every few minutes; `kubectl describe pod` shows `Last State: Terminated, Reason: OOMKilled`. - `JaaSOperatorPodDown` alert fires (the restart window is shorter than the recovery, so probes flap). - One specific snippet correlates with each restart. ### Diagnose The killing snippet is whichever the operator was reconciling when memory peaked. Easiest way to identify: ```shell kubectl --namespace logs deploy/jaas --previous --tail=200 \ | grep -B2 -i 'reconcil\|publish' | tail -30 ``` The last `Reconcile` log line before the kill names the snippet. Confirm via `jaas_snippet_rendered_bytes` if the operator's metrics endpoint was reachable before the kill (the histogram captures bytes per Synced reconcile; a runaway snippet stands out). ### Remediate 1. **Cap the runaway snippet.** Set `operator.storage.maxArtifactBytes` cluster-wide to refuse renders past a threshold (e.g., `16777216` for 16 MiB). The Publisher fails the snippet with `ReasonArtifactTooLarge` instead of attempting the write. The operator pod stops OOM-restarting. 2. **Raise operator memory.** The chart's `resources.memory` default is conservative (`64Mi`); a cluster with large rendered artifacts may need `256Mi` or more. Update via: ```shell helm --namespace upgrade jaas oci://ghcr.io/metio/helm-charts/jaas \ --reuse-values --set resources.memory=256Mi ``` 3. **Wait for `Sweep`.** The orphan `.tar.gz.tmp` files clear automatically once the surrounding issue is fixed and `maxTmpAge` (default 30 min) elapses. `jaas_storage_sweep_failures_total` flags any persistent issue. For S3 backends, OOM during a multipart upload leaves an incomplete upload at the S3 endpoint — most providers expire these automatically (AWS S3: 7-day default). No JaaS-side action needed. ## Multi-replica considerations With leader election on (the chart default when operator mode is enabled), only the lease-holder writes to storage. A storage-incident on the lease-holder is the worst case: the standby reads but cannot fill the gap until the lease transfers. To force a handover during a storage incident on one replica: ```shell kubectl --namespace delete lease -operator ``` The next replica acquires the lease within `LeaseDuration` (15s default), and its Publisher writes against its own (presumably healthy) view of the backend. ## Prevention - Use `persistence.enabled: true` in production. Default-off is for quick demos. - Run the chart's opt-in PrometheusRule (`operator.metrics.prometheusRule.enabled: true`) — `JaaSSnippetArtifactGrowing` catches runaway tarballs before they fill the PVC. - Set `operator.storage.maxArtifactBytes` to cap pathological snippets at admission time, not after they've written to disk. - For S3, configure a bucket lifecycle policy that does not delete tarballs the operator still considers live. The Publisher's `Prune` only deletes revisions the snippet's `status.history` no longer references. ## `JaaSStorageSweepFailures` alert Linked from the alert by name. The sweep is a background GC that removes orphaned `.tar.gz.tmp` residue left by Puts whose process died after the tmpfile landed but before the rename. The reconcile hot path is unaffected — Put still works — but stale `.tmp` files accumulate until the underlying issue is fixed. **Symptom:** `jaas_storage_sweep_failures_total` increases over time; `JaaSStorageSweepFailures` alert fires after >3 failures/hour (configurable). **Diagnose:** ```shell # Operator logs carry the underlying sweep error: kubectl --namespace logs deploy/jaas --tail=200 | grep "Storage sweep failed" # For local backend: check the volume's free space + permissions. kubectl --namespace exec deploy/jaas -- df -h /var/lib/jaas/artifacts kubectl --namespace exec deploy/jaas -- ls -la /var/lib/jaas/artifacts # For S3 backend: sweep is a no-op (Put is atomic, no .tmp residue), # so this alert firing on S3 is a wiring bug. Confirm backend: kubectl --namespace get deploy jaas --output jsonpath='{.spec.template.spec.containers[0].args}' | tr ',' '\n' | grep storage-backend ``` **Remediate:** - Disk full → increase the PVC size or shrink `operator.storage.maxArtifactBytes`. - Permission errors → ensure the operator's `securityContext.runAsUser` matches the PVC's filesystem ownership; reset with `chown` on a one-shot Job. - Sustained S3 listing throttling → unlikely on the local backend; for S3 this alert shouldn't fire at all because Sweep is a no-op there. Manual cleanup once the underlying issue is fixed: ```shell kubectl --namespace exec deploy/jaas -- find /var/lib/jaas/artifacts -name '*.tar.gz.tmp' -mmin +30 -delete ``` ## `WithdrawForced` event on snippet deletion If a snippet stuck in `Terminating` carries a `Warning WithdrawForced` Kubernetes Event, the operator has already done what it could — the finalizer was dropped after `--max-withdraw-wait` (default 1h) of failing Withdraws against the backend, and the snippet itself is GC'd. The tarball it owned is now orphaned in storage. To clean up: ```shell # Read the elapsed time + last backend error from the event message: kubectl describe jsonnetsnippet # Locate the orphan in the configured backend: # local: <--storage-path>///.tar.gz # s3: <--s3-prefix>///.tar.gz # Remove it once the backend is reachable again. ``` A force-drop should be rare — it means the backend was broken for the full wait window. Investigate **why** before lowering `maxWithdrawWait`: aggressive timeouts make transient apiserver/S3 incidents cause orphans you'd otherwise have recovered from naturally. ## Related runbooks - [artifacttoolarge.md](artifacttoolarge.md) — one snippet's output exceeds the cap (different symptom: snippet Ready=False, not "URL unreachable") - [sourcefetchfailed.md](sourcefetchfailed.md) — JaaS *consuming* an upstream artifact, not its own storage --- # Suspended Source: https://jaas.projects.metio.wtf/runbooks/suspended/ ## Symptom `READY=False`, `REASON=Suspended`. The snippet's `spec.suspend` is set to `true`. ## Cause An operator (or automation) paused reconciliation for this snippet, typically to investigate a downstream issue without the artifact being rewritten underneath them. The previously-published `ExternalArtifact` and the on-disk tarball are left intact — downstream Flux consumers continue serving the last successful render. This is a normal, intentional state. It is not a failure. ## Diagnosis ```shell kubectl get jsonnetsnippet --output jsonpath='{.spec.suspend}' ``` If the value is `true`, the suspension is set on the spec. Check `kubectl describe` for the last condition transition timestamp to see when it happened. ## Remediation To resume reconciliation: ```shell kubectl patch jsonnetsnippet --type=merge --patch '{"spec":{"suspend":false}}' ``` Or remove the field entirely: ```shell kubectl edit jsonnetsnippet # delete the `suspend: true` line under spec ``` The next reconcile picks up the snippet's current spec and republishes if anything has drifted. --- # Synced Source: https://jaas.projects.metio.wtf/runbooks/synced/ ## Symptom `kubectl get jsonnetsnippet` shows `READY=True` with `REASON=Synced`. This is the healthy state — no action required. ## Cause The most recent reconcile pass completed end-to-end: source resolved, libraries resolved, eval succeeded, tarball published, ExternalArtifact upserted. ## Diagnosis To inspect the published artifact: ```shell kubectl get externalartifact --output yaml ``` The `status.artifact.url` points at the operator's storage HTTP server. Curl it from a pod in the cluster to confirm the bytes match: ```shell kubectl run --rm --stdin --tty --restart=Never tmp --image=docker.io/library/curlimages/curl -- \ curl -sL | tar tzv ``` ## Remediation None — this is the healthy state. --- # Watch-layer silent failure Source: https://jaas.projects.metio.wtf/runbooks/operator-watch-silent/ Not tied to a per-snippet `Reason`. This page covers the one RBAC-denial path the reconciler cannot surface itself: when the **operator's own** ClusterRole is missing a verb on a watched resource kind, controller-runtime's informer fails to start its watch, logs warnings, and retries silently. The reconciler never sees the failure — and no snippet's status condition will tell you about it. If a per-snippet runbook (`rbacdenied.md`, `sourcefetchfailed.md`, `sourcenotready.md`) doesn't match the symptoms, this is where to look next. ## Symptom - Snippets that worked yesterday stop receiving watch-driven re-renders. They still reconcile on edits or `spec.interval` ticks, but not on upstream source changes. - `kubectl describe jsonnetsnippet` shows healthy or stale state — never `Reason=RBACDenied` or any other failure. - Operator pod is `Ready=True`, all probes pass. - A `Flux GitRepository` (or `OCIRepository` / `Bucket` / `ExternalArtifact` / `JsonnetLibrary`) advances its `status.artifact` but no JaaS reconcile fires. ## Why JaaS can't tell you directly controller-runtime's informer is what watches resource kinds at the apiserver. If the operator SA lacks `list`/`watch` on a kind: 1. The informer's initial LIST returns `Forbidden`. 2. controller-runtime logs `Failed to watch *v1.X: forbidden: ...` at error level. 3. The informer retries with exponential backoff — forever. 4. The reconciler's reconcile loop never gets events from that kind. 5. The reconciler itself is unaware that the watch is non-functional. The "no events arriving" condition is indistinguishable from "no actual changes upstream." This is the one diagnostic surface the operator can't unify with its other RBAC-denial paths (Fetcher / library Get / Publisher write — all per-reconcile and surfaced via `Reason=RBACDenied`). ## Diagnosis The smoking gun is in the operator's logs: ```shell kubectl --namespace logs deploy/jaas --tail=2000 \ | grep -E 'Failed to watch|"reflector.go"' \ | head -30 ``` Expected output if the watch layer is healthy: nothing. If broken, you'll see lines like: ```text E0610 12:34:56.789 reflector.go:227 "Failed to watch" err="failed to list *v1.GitRepository: ... forbidden: ServiceAccount \"jaas\" cannot list resource \"gitrepositories\" in API group \"source.toolkit.fluxcd.io\" at the cluster scope" type="*v1.GitRepository" ``` The error names the SA, verb, resource, and API group — that's the exact gap to close. Check what the operator SA can actually do: ```shell kubectl auth can-i list gitrepositories.source.toolkit.fluxcd.io \ --as=system:serviceaccount::jaas ``` Compare against the chart-rendered ClusterRole: ```shell kubectl get clusterrole -operator --output yaml | grep -A2 source.toolkit.fluxcd.io ``` ## Remediation The operator's ClusterRole verbs are defined in the chart's `templates/clusterrole-operator.yaml` (in the metio/helm-charts repo, under `charts/jaas/`). Three causes warrant separate fixes: ### 1. Chart upgraded with `rbac.create: false` Someone disabled chart-rendered RBAC (`operator.rbac.create: false`) and the external RBAC source missed a verb. Either re-enable chart-rendered RBAC, or update whatever owns the ClusterRole to grant the missing verbs. ### 2. Manual chart edit removed verbs A `kubectl edit clusterrole -operator` or a hand-rolled overlay removed verbs the chart originally rendered. Restore via `helm upgrade --install` (idempotent for an installed chart). ### 3. New source kind added but chart's drift gate didn't catch it The drift-gate test in the chart's `tests/clusterrole-operator_test.yaml` (metio/helm-charts, under `charts/jaas/`) — the "ClusterRole drift gate" case — is supposed to fail at PR time if `operator.FluxSourceKinds` adds a kind without a matching ClusterRole entry. If you reach this runbook page because of a new kind, **it means the test passed but production RBAC is still missing it** — investigate why (test bypassed, chart drift, etc.). Add the verb manually as a hotfix, then file a bug against the drift gate. After granting the verb, restart the operator pod so a fresh informer picks up the new ServiceAccount-token permissions: ```shell kubectl --namespace rollout restart deploy/jaas ``` Watch-driven re-renders resume within seconds. ## Why not detect this automatically A startup probe (operator does a test `LIST` per kind and refuses to boot on Forbidden) was considered and rejected: - It would block startup on transient apiserver flakes during deploys. - The CRD-watcher pattern already handles the missing-CRD case gracefully (the `crdWatcher` engages a watch dynamically when the CRD becomes `Established=True`). Layering "and also fail on Forbidden" complicates that contract. - A misconfigured cluster should surface the issue via the existing logs + the `kubectl auth can-i` workflow, which is the standard k8s troubleshooting path. The diagnostic trail above is the supported recovery story. If a user reports hitting this in the wild and finds the log-grep step too obscure, a follow-up is a `jaas_informer_watch_failures_total` Prometheus counter plus a `JaaSInformerWatchFailing` alert — same shape as `JaaSStorageSweepFailures` from the storage layer. Track in `open-items.md` if it comes up. ## Related runbooks - [rbacdenied.md](rbacdenied.md) — per-reconcile RBAC denials the reconciler CAN surface (tenant SA can't read a source CR / library, can't write ExternalArtifact). If a snippet's status says `Reason=RBACDenied`, start there instead. - [storage-recovery.md](storage-recovery.md) — different failure surface (storage backend rather than apiserver), same "graceful degradation, diagnosis via logs + metrics" shape. --- # Workqueue saturation Source: https://jaas.projects.metio.wtf/runbooks/workqueue-saturation/ Linked from the `JaaSControllerWorkqueueDepthHigh` alert. Fires when the reconciler's workqueue holds more items than the configured threshold (default 50) for the alert window. Not tied to a `Reason` constant — workqueue depth is a controller-runtime signal, not a per-snippet status. ## Symptom ```text ALERTS{alertname="JaaSControllerWorkqueueDepthHigh", controller="jsonnetsnippet"} ``` - New snippet writes settle slowly (status takes minutes to flip, not seconds). - Existing snippets re-render later than `spec.interval` would suggest. - `kubectl describe jsonnetsnippet` shows a stale `ObservedGeneration`. ## Cause The operator is dequeuing reconciles slower than the API server enqueues them. Common causes, in observed-frequency order: 1. **Slow Publisher backend.** S3 throttling, a slow PVC, or a stalled object-store transaction — each reconcile blocks on the storage `Put`. 2. **API server pressure.** The cluster's apiserver is slow on `GET` / `UPDATE` (often during a control-plane upgrade or under heavy general load). 3. **Per-snippet rate-limiter exhaustion.** A flapping snippet eats its token-bucket budget; the controller's exponential backoff stretches the queue. 4. **A large fan-out from a single source watch.** One Flux source republishes and 100 snippets reference it; every snippet's reconcile lands in the queue at once. 5. **Webhook latency.** When `--enable-webhook` is on, every snippet write traverses the validating webhook. A wedged webhook (cert issue, slow tenant client) holds the apiserver's call open and indirectly enlarges the queue. ## Diagnosis ```shell # Per-controller queue depth — confirm which controller is saturated kubectl --namespace port-forward svc/jaas-metrics 8083:8083 & curl -s localhost:8083/metrics | grep -E 'workqueue_depth|workqueue_adds_total' # Reconcile-time histogram — separates "lots of queued items" (fan-out) # from "each reconcile is slow" (storage / apiserver). curl -s localhost:8083/metrics | grep 'controller_runtime_reconcile_time_seconds' ``` Cross-reference operator logs for the slow path: ```shell kubectl --namespace logs deploy/jaas --tail=500 \ | grep -E 'reconcile|publisher|s3|webhook' ``` If `controller_runtime_reconcile_time_seconds` p99 is also high, the alert is the symptom — `JaaSReconcileLatencyHigh` is the more useful page; see [reconcile-latency.md](reconcile-latency.md). ## Remediation - **Storage backend slow.** Switch from `local` (PVC) to `s3` for higher write throughput, or vice versa if S3 is throttled. See [storage-recovery.md](storage-recovery.md). - **Apiserver slow.** Pause spec-update churn (`spec.interval` longer on hot snippets), then wait for control-plane health to return. - **Rate-limiter exhaustion.** Increase `operator.rerenderBurst` to absorb the spike, then investigate why a snippet is flapping (typically a `Reason*` other than `Synced` keeps firing — check `kubectl get events`). - **Fan-out from a single source.** Stagger snippet intervals so their watch events don't all settle at once. The controller serializes per-snippet; concurrency across snippets is bounded by `MaxConcurrentReconciles` (set high enough at chart default — 5 — that drag from a single fan-out is unusual). - **Webhook latency.** `kubectl get validatingwebhookconfiguration jaas-jsonnetsnippet --output yaml` and confirm the `caBundle` is current; restart the operator if the cert was rotated externally. ## Prevention - Run `operator.metrics.prometheusRule.enabled: true` so this alert fires *before* downstream consumers notice. - Cap `--max-artifact-bytes` so a runaway snippet can't slow every Publisher write behind it. - For multi-replica HA, leader election keeps only one replica reconciling — workqueue depth on the lease-holder is the only one that matters. --- # Contributing Source: https://jaas.projects.metio.wtf/contributing/ Build and test JaaS inside a containerized dev shell — the only host requirement is a container runtime — and understand the CI gate and the calendar-based release pipeline. Source, issues, and releases live at [github.com/metio/jaas](https://github.com/metio/jaas). --- # Building and Testing Source: https://jaas.projects.metio.wtf/contributing/building/ The host needs no Go toolchain. All build and test commands run inside a containerized dev shell defined by `dev/Containerfile` and driven by `ilo`. The `.ilo.rc` at the repo root supplies the shell arguments, so the short form is always available: ```shell ilo bash -c '' ``` The dev shell pre-installs the Go toolchain, all static-analysis tools, `controller-gen`, and the envtest asset bundle. The Go module cache is mounted from `${XDG_CACHE_HOME}/go` so repeated builds do not re-download dependencies. ## Building ```shell ilo bash -c 'go build -o jaas .' ``` The `Dockerfile` builds the production image. It accepts `VERSION` and `COMMIT` build args: ```shell docker build -t ghcr.io/metio/jaas:dev . docker build --build-arg VERSION=v1 --build-arg COMMIT=abc123 -t ghcr.io/metio/jaas:dev . ``` ## Regenerating generated code CRD manifests and the `zz_generated.deepcopy.go` file are generated by `controller-gen`. Run this after touching `api/v1/` types: ```shell # Regenerate deep-copy functions ilo bash -c 'controller-gen object paths=./api/v1/...' # Regenerate CRD manifests under config/crd/bases/ ilo bash -c 'controller-gen crd paths=./api/v1/... output:crd:dir=./config/crd/bases' ``` ## Static analysis golangci-lint is not used. The standalone tools below run directly, both in CI and locally inside the dev shell: ```shell ilo bash -c 'go vet ./...' ilo bash -c 'staticcheck ./...' # config: staticcheck.conf, checks = ["all"] ilo bash -c 'gofumpt -l .' # empty output means clean; any output is a failure ilo bash -c 'gosec ./...' # inline #nosec justifications silence false positives ilo bash -c 'govulncheck ./...' # reachable-from-code advisories only ilo bash -c 'arch-go' # architecture rules; config: arch-go.yml ``` ## Test layers ### Pure unit tests Table-driven tests with no external state. They live next to the code they cover across `internal/...` and `api/v1/`. Several act as drift gates: `conditions_test.go` verifies that every `Reason*` constant has a matching `docs/runbooks/.md`, and `TestErrorResponse_StableCodeValues` pins the wire-level `ErrCode*` strings against accidental rename. ```shell ilo bash -c 'go test -count=1 -race -cover ./...' ``` To run a single test by name: ```shell ilo bash -c 'go test -count=1 -v -run TestName ./internal/handler/' ``` ### Envtest-backed operator tests Files named `envtest_*_test.go` (in `internal/operator/`, `internal/webhook/selfsigned/`, and `main_envtest_test.go` at the repo root) boot a real `kube-apiserver` and `etcd` via controller-runtime's `envtest` package and run the reconciler, webhook, and full `run(...)` function against them. The tests share one apiserver instance per test binary, guarded by a `sync.Once`, so the startup cost is paid once. Each test `t.Skip`s when `KUBEBUILDER_ASSETS` is unset — there is no build tag. The dev shell pre-stages the envtest asset bundle via `setup-envtest` (pinned `ENVTEST_K8S_VERSION`) and exports `KUBEBUILDER_ASSETS`, so these tests run by default inside `ilo bash`. On a host without the bundle they silently skip. The envtest harness sets `Config.SkipImpersonation` (the only place that setting is allowed) and defaults `MetricsBindAddress` to `"0"` so parallel test cases do not fight over the metrics port. ```shell ilo bash -c 'go test -count=1 -race -cover ./...' ``` ### Golden / example end-to-end tests `examples_test.go` boots the full binary via `runInBackground` and asserts HTTP responses against golden files under `testdata/golden/`. Comparison is semantic — both sides are parsed as JSON and compared on the parsed values, so whitespace and key ordering are irrelevant. After changing an example or adding a new one, regenerate the golden files: ```shell ilo bash -c 'go test -update ./...' ``` Inspect and commit the diff in `testdata/golden/`. ### Fuzz tests Fuzz targets in `internal/handler/`, `internal/sources/`, and `internal/urlguard/` harden the request path, the tar/gzip artifact unpacker, and the SSRF URL/IP parser against adversarial input. CI exercises their seed corpus as ordinary unit tests. To fuzz interactively: ```shell ilo bash -c 'go test -fuzz=FuzzName -fuzztime=30s ./internal/urlguard/' ``` ### Benchmarks Throughput benchmarks in `internal/eval/`, `internal/storage/`, and `internal/operator/` cover reconcile throughput, watch mapping, and the tenant-client cache. They are baselines, not merge gates. The reconcile benchmark is envtest-backed and skips without `KUBEBUILDER_ASSETS`. ```shell ilo bash -c 'go test -bench=. -benchmem -run=^$ ./internal/operator/' ``` ### Kind operator smoke tests The cluster-level layer runs outside `go test`. Pure-`kubectl` bash scenarios in `hack/smoke/` run against a real kind cluster via `.github/workflows/kind-smoke.yml`. To run a scenario locally against any reachable cluster, deploy JaaS and invoke the scenario scripts directly: ```shell hack/smoke/scenario-inline-files.sh ``` See [CI and releases](/contributing/ci-and-release/) for how the smoke layer fits into the two-angle end-to-end strategy. --- # CI and Releases Source: https://jaas.projects.metio.wtf/contributing/ci-and-release/ ## Static analysis golangci-lint is not used. The tools below run directly, both in CI and locally inside the dev shell (see [Building and Testing](/contributing/building/)). Every tool is a separate, auditable binary with its own config file. | Tool | Scope | Config | |------|-------|--------| | `go vet` (all analyzers) | Go correctness | — | | [staticcheck](https://staticcheck.dev) | Bugs, simplifications, style | `staticcheck.conf` (`checks = ["all"]`) | | [gosec](https://github.com/securego/gosec) | Security patterns | inline `#nosec` justifications | | [govulncheck](https://go.dev/security/vuln/) | Known vulnerabilities in the dependency graph | — | | [arch-go](https://github.com/arch-go/arch-go) | Architecture rules | `arch-go.yml` | | [gofumpt](https://github.com/mvdan/gofumpt) | Strict formatting | — | | [REUSE](https://reuse.software) | License / copyright metadata on every file | `REUSE.toml` | | [yamllint](https://yamllint.readthedocs.io) | YAML | `.yamllint.yaml` | | [actionlint](https://github.com/rhysd/actionlint) | GitHub Actions workflows | — | | [markdownlint](https://github.com/DavidAnson/markdownlint-cli2) | Markdown | `.markdownlint.yaml` | | [typos](https://github.com/crate-ci/typos) | Spelling | `.typos.toml` | | [Trivy](https://github.com/aquasecurity/trivy) | Container image CVEs | — | ### Architecture rules `arch-go.yml` pins two invariants enforced with 100% compliance: - `api/v1` depends on neither the operator internals nor `sigs.k8s.io/controller-runtime`. The CRD types stay importable by external consumers without dragging the manager in. Scheme registration uses apimachinery's `runtime.NewSchemeBuilder` for exactly this reason. - `internal/urlguard` — the SSRF-defence layer — depends on the standard library only, with no internal and no external imports. This keeps the IP/URL validation logic self-contained and straightforward to fuzz in isolation. ## The verify.yml PR gate `.github/workflows/verify.yml` fans out into one job per concern. A failure points straight at the offending gate. CI installs each tool fresh via `go run @latest`; the dev shell pre-installs the same tools at the same versions, so local and CI runs agree. | Job | What it runs | |-----|--------------| | `test` | `go build ./...` then `go test -v -cover ./...` | | `lint-go` | `go vet ./...`, `staticcheck ./...`, `gosec ./...`, `gofumpt -l .` (fails on any output) | | `vulnerabilities` | `govulncheck ./...` — reachable-from-code advisories are a hard merge gate | | `architecture` | `arch-go` against `arch-go.yml` | | `reuse` | `fsfe/reuse-action` — every file must carry SPDX headers | | `yaml` | `yamllint` against `.yamllint.yaml` | | `github-actions` | `actionlint` | | `markdown` | `markdownlint-cli2` against `.markdownlint.yaml` | | `typos` | `typos` against `.typos.toml` | | `prose` | Vale against the shared metio/vale-config style; error-level findings (naming/branding) fail the gate | | `container-image` | `docker buildx` build (load, no push) followed by Trivy scan; hard-fails on any fixable `CRITICAL`/`HIGH` | ### All-green aggregate The workflow ends with a single `all-green` job: - `needs` every other job - runs `if: always()` - fails unless each dependency `result` is `success` or `skipped` That one job is the **only** check marked required in branch protection. Adding a new job to the `needs` list of `all-green` covers it automatically; no new required check needs to be registered. The `govulncheck` gate is a hard blocker. A reachable-from-code advisory that cannot be fixed by bumping a dependency blocks the PR until resolved. Resolution is usually a `toolchain` bump in `go.mod` (for stdlib advisories) or `go get -u` (for module advisories). ## The release pipeline Releases are calendar-based and automated. `.github/workflows/release.yml` runs on a Monday cron (`47 7 * * MON`) plus manual `workflow_dispatch`. The version is computed from the run date: ```shell date +'%Y.%-m.%-d' ``` For a Monday run on 2026-06-22 that produces `2026.6.22`. goreleaser is not used. GPG is not used. The pipeline is hand-rolled across three jobs. ### prepare Counts commits since the last release touching the build-relevant paths (`go.mod main.go internal api config Dockerfile`). Every downstream job gates on that count being non-zero (or there being no prior release at all), so an empty week publishes nothing. ### build A cross-compile matrix over ten platform/arch combinations: - `linux/amd64`, `linux/arm` (v7), `linux/arm64`, `linux/ppc64le`, `linux/riscv64`, `linux/s390x` - `windows/amd64`, `windows/arm64` - `darwin/amd64`, `darwin/arm64` Each platform compiles with: ```shell CGO_ENABLED=0 go build -trimpath \ -ldflags="-s -w -X main.version= -X main.commit=" . ``` Archives are `tar.gz` on linux/darwin and `zip` on windows (with a `.exe` binary), each bundling `LICENSE` and `README.md`. ### container A single `docker buildx` multi-arch push to `ghcr.io/metio/jaas:{latest,}` over the six linux arches. The `Dockerfile` builder is pinned to `$BUILDPLATFORM` and cross-compiles via Go's `GOARCH`, so the multi-arch build needs no QEMU. SBOM and provenance are attached. The image is signed with cosign keyless immediately after push: ```shell cosign sign \ --yes \ --annotations "repo=metio/jaas" \ --annotations "workflow=Automated Release" \ ghcr.io/metio/jaas@ ``` Identity is proven by the workflow's OIDC certificate issued by Fulcio; there is no key to distribute. ### github Gates on both `build` and `container` succeeding. Downloads all platform archives, computes a single `SHA256SUMS` over them, signs it with cosign keyless (Sigstore bundle format), and publishes the GitHub Release with all archives, the checksum file, and the bundle attached. To verify a release download: ```shell cosign verify-blob jaas__SHA256SUMS \ --bundle jaas__SHA256SUMS.bundle \ --certificate-identity-regexp '^https://github.com/metio/jaas/\.github/workflows/release\.yml@refs/' \ --certificate-oidc-issuer https://token.actions.githubusercontent.com sha256sum -c jaas__SHA256SUMS ``` To verify the container image: ```shell cosign verify ghcr.io/metio/jaas: \ --certificate-identity-regexp '^https://github.com/metio/jaas/\.github/workflows/release\.yml@refs/' \ --certificate-oidc-issuer https://token.actions.githubusercontent.com ```