Health Subsystem — Orkestra

Orkestra’s health subsystem is a lightweight HTTP server that exposes Kubernetes-native probe endpoints and per-CRD health state. It starts before anything else and stays alive until shutdown.

Operator-level probes

Four endpoints serve the operator’s lifecycle state:

Endpoint	Probe type	Returns
`GET /startup`	startupProbe	200 once startup is complete; 503 while booting
`GET /health`	livenessProbe	200 when healthy; 500 on fatal condition
`GET /ready`	readinessProbe	200 when ready; 503 during startup and shutdown
`GET /metrics`	—	Prometheus metrics

The /startup probe prevents liveness and readiness probes from running too early — Kubernetes will not send traffic or restart the pod until /startup returns 200.

Operator health states

The server tracks four independent flags:

State	Set when	Clears when
`started`	HTTP server binds and begins serving	Never once set
`startup`	`SetStartupComplete()` is called	Never once set
`healthy`	HTTP server starts	`Unhealthy()` is called (fatal condition)
`ready`	`SetReady()` is called	`Degraded()` or `Shutdown()` is called

Degraded() — transitions ready to false without touching healthy. The operator is alive but not ready to accept traffic. Used for transient conditions.

Unhealthy() — sets healthy to false. Signals a fatal condition. The liveness probe fails and Kubernetes restarts the pod.

CRD-level health

Each operatorBox tracks its own CRDHealth instance — see CRD Health.

Katalog API routes

The health server also hosts all CRD-specific endpoints. These are registered by the runtime before Start() is called — they cannot be added after the server starts:

GET /katalog              — all CRDs with health summary
GET /katalog/{crd}        — configuration + health for one CRD
GET /katalog/{crd}/health — live health status for one CRD

The /katalog/{crd} endpoint includes per-provider stats when providers are declared.

Where to go next

CRD Health — per-CRD health tracking, degradation logic, and health endpoints