Resilience Pack

2 min read

Operators that stay running through panics, bad input, and degraded state.

ork init --pack resilience

The examples

DirectoryWhat it demonstrates
safe-reconcilePanic isolation — a nil pointer in a typed hook is caught by safeReconcile, logged with a stack trace, and re-queued with backoff. Other CRDs keep reconciling. The process never crashes.
admission-protectionRuntime validation as a resilience layer — a bad CR degrades the operator after failureThreshold is exceeded. Patch the CR and the operator recovers automatically. No restart needed.
crd-missing-recoveryRuntime CRD watch without deletion protection. Delete the CRD at runtime — Orkestra detects the disappearance, degrades, and retries in a loop. Re-apply the CRD and CR and the operator recovers with no restart.
leader-failoverHigh-availability leader election. Deploy with replicaCount: 2, kill the konductor pod — a follower is elected within leaseDuration and reconciliation continues with no manual intervention.

What every example shows

  • Orkestra stays Operational at the runtime level even when individual operators are Degraded
  • The Control Center shows the exact failure — consecutive fail count, last error, stack trace (for panics)
  • Recovery is automatic — when the root cause is fixed, the operator moves from Degraded → Pending → Healthy without intervention

Run the full suite

ork e2e -f resilience/e2e.yaml

Or simulate without a cluster:

ork simulate -f resilience/simulate.yaml