Secure by Design
The typical pattern for security in infrastructure tooling is additive: build the system first, then restrict what it can do. Add a flag to disable the admin endpoint. Strip permissions in the production role. Document which commands are “for development only.” The result is a system that is secure if you remember to configure it correctly — and fragile everywhere you forget.
Orkestra inverts this. Security is not a layer added afterward. It is a property of the design — present by default at every layer, from the binaries that ship to the permissions they request to the rules that govern what reaches the cluster.
The goal is simple: the system should be trustworthy by construction, not by configuration.
The binary surface is minimal by construction
Orkestra ships three compiled binaries from a single codebase. Go build tags separate them at compile time — not behind a flag, not at runtime.
| Binary | What it can do |
|---|---|
ork (developer CLI) | Everything — validate, generate, simulate, e2e, template, init, run |
ork (runtime, //go:build runtime) | ork run only |
ork-gateway (gateway, //go:build gateway) | ork gate only |
The runtime binary cannot generate RBAC bundles. It cannot scaffold operators. It cannot enumerate registered CRDs or exfiltrate Katalog definitions. That code does not exist in the binary.
This is a structural guarantee, not a permissions check. A permissions check can be misconfigured. A compile-time exclusion cannot.
The developer CLI is intentionally feature-complete — ork validate, ork simulate, ork e2e, ork generate, ork template are all available locally. Nothing is held back from the engineer writing and testing patterns. What is held back is from the production process running them.
Two trust domains that cannot cross
The Runtime and Gateway run as two separate processes with separate Kubernetes ServiceAccounts and separate ClusterRoles. Neither carries the permissions of the other.
The Runtime reconciles custom resources. It reads CRs, applies templates, manages the resources declared in onCreate and onReconcile blocks, and emits events. It has no permissions to touch webhook configurations or TLS certificates.
The Gateway serves admission webhooks — validation, mutation, deletion protection, and version conversion. It manages TLS automatically. It has no permissions to reconcile CRs or manage the resources your operator controls.
A compromise of the Runtime cannot touch webhook infrastructure. A compromise of the Gateway cannot touch your CRs. The blast radius of either failure is bounded by what that process was ever permitted to do.
When a feature is disabled in the Katalog, the Gateway removes the corresponding webhook configuration — the security surface shrinks automatically to match the declared intent.
CRD isolation: each operator runs in its own cell
Each CRD declared in a Katalog runs inside its own OperatorBox — an isolated runtime cell with its own informer, event queue, worker pool, health state, and reconciler instance. Nothing is shared.
A panic in one reconciler — any unrecovered Go panic — is caught, logged with the full stack trace, and requeued with backoff. The affected OperatorBox retries. Every other OperatorBox continues uninterrupted.
Queue pressure in one OperatorBox does not affect reconcile latency in another. A misbehaving CRD does not destabilize the rest of the platform.
Communication between OperatorBoxes is always opt-in via a cross: declaration in the Katalog. What is not declared does not happen.
RBAC is derived, not authored
Orkestra never auto-creates permissions. Every right your operator has is computed from what you declared in the Katalog and reviewed by you before it reaches the cluster.
ork validate --full # see exactly what permissions will be requested
ork generate bundle # produce the bundle containing those permissions
kubectl apply -f bundle.yaml
The generated bundle contains two separate ClusterRoles — one for the Runtime and one for the Gateway. The Control Center does not need them. They do not overlap. Gateway permissions are only generated if the features that require them are declared: no validation rules means no admissionregistration.k8s.io entries in the bundle at all.
Traditional operators often ship with:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
The Orkestra bundle contains:
- apiGroups: ["platform.orkestra.io"]
resources: ["websites"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["platform.orkestra.io"]
resources: ["websites/status"]
verbs: ["get", "update", "patch"]
Only the API groups you declared. Only the resource kinds those groups produce. Only the verbs those resources need. Built-in resources (Deployments, Services, ConfigMaps) only appear if your Katalog actually uses them.
The bundle diffs cleanly in GitOps workflows. Every change to the Katalog produces a visible, reviewable diff in the bundle before it reaches the cluster.
The containers are hardened by default
Both production binaries run in a distroless base image: no shell, no package manager, no curl, no tar, no standard Unix utilities. An attacker who gains code execution in the container has a single static binary and nothing to pivot with.
The Helm chart applies a hardened security context to every pod by default:
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefault
These are on by default. Nothing needs to be configured to get them. They apply to the runtime, the gateway, and the control center.
The same security profiles are available to workloads you declare in your Katalog:
securityContext:
profile: hardened
podSecurity:
profile: hardened
hardened maps to readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ["ALL"], UID 65534. A one-line declaration. No need to repeat the same five fields across every Deployment in the Katalog.
Each layer is designed for the one below it to fail
This is the principle that ties every security property together.
The level-triggered reconciler assumes it will be interrupted — crash, SIGKILL, node failure, all produce a partial state that the next reconcile corrects. The panic recovery in each OperatorBox assumes individual reconciles will fail. The isolated worker pools assume one CRD’s failure will happen. The leader election assumes the entire process will crash. The admission layer assumes the reconciler may not see every CR. The production binary assumes someone might try to misuse whatever surface is exposed.
Each layer is designed for the one below it to fail, and to remain correct when it does.
The result is a system where trustworthy behavior does not depend on everything going right. It depends on the guarantees holding even when things go wrong.
Validation runs at five independent points
Security rules declared in a Katalog are not enforced once. They are enforced at every layer where enforcement is possible:
1. Parse time — strict YAML: unknown fields are errors, not silent defaults
2. ork validate — offline: schema, templates, dependency graph, namespace rules
3. ork simulate — offline: reconcile loop against in-memory state
4. Admission webhook — live: Gateway intercepts CREATE/UPDATE before etcd storage
5. Reconcile time — live: Runtime re-checks every rule on every reconcile cycle
A deny rule that catches a bad CR at admission time will also catch it if the CR somehow bypasses the webhook. The Runtime enforces rules independently of the Gateway. Both layers must fail independently before a rule is violated.
Reconcile-time outcomes are observable without reading logs. When a deny rule fires, the Runtime writes ValidationFailed=True to the CR’s status conditions. When a warn rule fires, it writes ValidationWarning=True with the message. Both are visible in the Control Center Conditions tab and via kubectl get <cr> -o yaml. Admission-time rejections surface at the terminal — the CR is never stored.
Each enforcement point assumes the one before it may be absent or imperfect. This is not redundancy — it is how the system stays correct when parts of it fail.
Orkestra’s own credentials are never hardcoded
A Katalog is a behavioral contract — a description of what the operator does. It is committed to source control, reviewed in pull requests, distributed as a versioned OCI artifact. Anywhere Orkestra itself needs a credential — to fetch a private source, to send a notification, to pull from a protected registry — the credential is named, not embedded.
The pattern is consistent: the YAML names an environment variable; the runtime resolves it.
File source authentication — when a Katalog imports a private file over HTTPS or from GitHub:
files:
- url: https://private.host/platform-policy.yaml
auth:
type: bearer
fromEnv: PLATFORM_TOKEN
- url: https://github.com/myorg/private-registry
auth:
type: github
fromEnv: GITHUB_TOKEN
- url: https://internal.host/policy.yaml
auth:
type: basic
usernameFromEnv: REGISTRY_USER
passwordFromEnv: REGISTRY_PASSWORD
Registry source authentication — when a Komposer pulls from a private OCI registry or a private Git registry:
imports:
registry:
- url: registry.myorg.com/operators/postgres@v14-hardened
oci: true
auth:
type: basic
usernameFromEnv: REGISTRY_USER
passwordFromEnv: REGISTRY_PASSWORD
- url: https://github.com/myorg/private-registry
auth:
type: github
fromEnv: GITHUB_TOKEN
Notification credentials — SMTP credentials (SMTP_HOST, SMTP_PORT, SMTP_USER, SMTP_PASS, SMTP_FROM) are read from the runtime’s process environment at startup. They are never declared in the Katalog YAML. The Slack webhook URL is declared per team in the notification block; it should be treated with the same care as any other credential — injected via the Helm chart’s runtime.env block rather than committed to source control.
In each case, the YAML file that ships in the OCI artifact contains no credential. The artifact is the same in every environment. Only the environment it runs in differs.
Where this appears across the documentation
- Binaries & build tags — complete command matrix by binary
- RBAC — full bundle contents and
--forflag for component-scoped generation - Admission webhooks — validation, mutation, and conversion details
- Deletion protection — protecting CRs and Orkestra’s own infrastructure
- Validation pipeline — all five enforcement points in detail
- Pod security —
baseline,restricted,hardenedprofiles - Trust and Failure Model — how the security layers compound