Operator Webhook Flow
The Lucid Operator uses a Kubernetes Mutating Admission Webhook to inject auditor sidecars into AI workload pods. This document explains the webhook flow in detail.
Overview
The Operator implements two webhooks:
- Mutating Webhook (
/mutate): Injects auditor sidecars into pods - Validating Webhook (
/validate): Verifies auditor images are notarized
sequenceDiagram
participant User
participant API as K8s API Server
participant MutateWH as Mutating Webhook
participant ValidateWH as Validating Webhook
participant Verifier as Lucid Verifier
participant Scheduler as K8s Scheduler
User->>API: Create Pod (with lucid.io/secured=true)
API->>MutateWH: MutatingAdmissionReview
MutateWH->>MutateWH: Check labels
MutateWH->>MutateWH: Inject sidecar containers
MutateWH->>API: Return mutated Pod spec
API->>ValidateWH: ValidatingAdmissionReview
ValidateWH->>Verifier: Check image notarization
Verifier-->>ValidateWH: Notarization status
ValidateWH->>API: Allow/Deny
API->>Scheduler: Schedule Pod (if allowed)
Scheduler->>Node: Start Pod with sidecars
Injection Trigger
Pods are injected with auditor sidecars when they have the label:
metadata:
labels:
lucid.io/secured: "true"
This unified label replaces the previous annotation-based approach and provides a cleaner, more Kubernetes-native experience.
Webhook Registration
The Operator registers a MutatingWebhookConfiguration on startup:
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: lucid-operator
webhooks:
- name: mutate.lucid.computing
clientConfig:
service:
name: lucid-operator-webhook
namespace: lucid-system
path: /mutate
caBundle: <base64-encoded-ca>
rules:
- operations: ["CREATE"]
apiGroups: [""]
apiVersions: ["v1"]
resources: ["pods"]
namespaceSelector:
matchLabels:
lucid.computing/enabled: "true"
failurePolicy: Fail
Mutation Process
Step 1: Request Interception
When a pod is created in a labeled namespace, the API server sends an AdmissionReview to the webhook:
{
"kind": "AdmissionReview",
"request": {
"uid": "abc-123",
"kind": {"group": "", "version": "v1", "kind": "Pod"},
"operation": "CREATE",
"object": {
"metadata": {
"annotations": {
"lucid.computing/auditors": "injection,toxicity"
}
},
"spec": {
"containers": [...]
}
}
}
}
Step 2: Annotation Parsing
The webhook extracts auditor configuration from annotations:
| Annotation | Description | Example |
|---|---|---|
lucid.computing/auditors |
Auditors to inject | injection,toxicity |
lucid.computing/audit-mode |
Mode (enforce/observe) | enforce |
lucid.computing/skip-audit |
Skip injection | false |
Step 3: Sidecar Injection
For each requested auditor, the webhook adds a sidecar container:
containers:
- name: lucid-guardrails-auditor
image: ghcr.io/lucid/lucid-guardrails-auditor:latest
ports:
- containerPort: 8090
env:
- name: LUCID_AUDITOR_ID
value: "lucid-guardrails-auditor"
- name: LUCID_CHAIN_NEXT
value: "http://localhost:8093" # Next auditor in chain
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Step 4: Chain Configuration
The webhook configures the auditor chain by setting environment variables:
flowchart LR
A["Injection (8090)<br/>CHAIN_NEXT=8093"] --> B["Toxicity (8093)<br/>CHAIN_NEXT=5000"]
B --> C["AI Model<br/>(5000)"]
Step 5: Response
The webhook returns a JSON Patch to mutate the pod:
{
"kind": "AdmissionReview",
"response": {
"uid": "abc-123",
"allowed": true,
"patchType": "JSONPatch",
"patch": "W3sib3AiOiJhZGQiLCJwYXRoIjoiL3NwZWMvY29udGFpbmVycyIsInZhbHVlIjpbLi4uXX1d"
}
}
TLS Certificate Management
The webhook requires TLS for secure communication with the API server. Proper certificate management is critical for production deployments.
Self-Signed Certificates (Development)
The Operator auto-generates certificates on startup using the cryptography library:
# Simplified certificate generation (see cluster.py for full implementation)
from cryptography.x509 import CertificateBuilder, SubjectAlternativeName, DNSName
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import rsa
# Generate key pair
private_key = rsa.generate_private_key(public_exponent=65537, key_size=2048)
# Build certificate with SANs
cert = (
CertificateBuilder()
.subject_name(x509.Name([x509.NameAttribute(NameOID.COMMON_NAME, "lucid-operator")]))
.add_extension(
SubjectAlternativeName([
DNSName("lucid-operator.lucid-system.svc"),
DNSName("lucid-operator.lucid-system.svc.cluster.local"),
DNSName("lucid-operator"),
]),
critical=False,
)
.sign(private_key, hashes.SHA256())
)
The certificate and key are stored in a Kubernetes Secret:
apiVersion: v1
kind: Secret
metadata:
name: lucid-operator-tls
namespace: lucid-system
type: kubernetes.io/tls
data:
tls.crt: <base64-encoded-certificate>
tls.key: <base64-encoded-private-key>
cert-manager (Production)
For production deployments, use cert-manager for automated certificate lifecycle management.
Prerequisites
-
Install cert-manager in your cluster:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml -
Create a ClusterIssuer (self-signed for internal use):
apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: lucid-selfsigned-issuer spec: selfSigned: {}
Or use an internal CA:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: lucid-ca-issuer
spec:
ca:
secretName: lucid-ca-key-pair
Certificate Resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: lucid-operator-cert
namespace: lucid-system
spec:
secretName: lucid-operator-tls
duration: 8760h # 1 year
renewBefore: 720h # 30 days before expiry
issuerRef:
name: lucid-selfsigned-issuer
kind: ClusterIssuer
commonName: lucid-operator
dnsNames:
- lucid-operator
- lucid-operator.lucid-system
- lucid-operator.lucid-system.svc
- lucid-operator.lucid-system.svc.cluster.local
usages:
- server auth
- digital signature
- key encipherment
Webhook Configuration with cert-manager
Use cert-manager's CA injector to automatically update the webhook's caBundle:
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: lucid-operator-webhook
annotations:
cert-manager.io/inject-ca-from: lucid-system/lucid-operator-cert
webhooks:
- name: mutate.lucid.computing
clientConfig:
service:
name: lucid-operator
namespace: lucid-system
path: /mutate
# caBundle is auto-injected by cert-manager
# ... rest of config
Certificate Rotation
Self-Signed (Development)
The Operator monitors certificate expiration and regenerates before expiry:
- Certificate validity: 365 days by default
- Renewal check: Every 24 hours
- Renewal trigger: 30 days before expiry
cert-manager (Production)
cert-manager handles rotation automatically:
- Monitors certificate expiration
- Renews based on
renewBeforesetting - Updates the Secret with new certificate
- CA injector updates webhook
caBundle
Important: Ensure the Operator pod restarts or reloads the certificate when it changes. Options:
-
Reloader: Use stakater/Reloader to auto-restart pods:
metadata: annotations: secret.reloader.stakater.com/reload: "lucid-operator-tls" -
File watch: The Operator watches the mounted certificate files for changes
Production Checklist
Before deploying to production, verify:
- cert-manager installed and ClusterIssuer configured
- Certificate resource created with appropriate DNS names
- CA injection annotation added to webhook configurations
- Certificate renewal tested (shorten
durationtemporarily) - Pod restart mechanism configured for certificate rotation
- Monitoring set up for certificate expiration (Prometheus alerts)
Monitoring Certificates
Prometheus Metrics
cert-manager exposes metrics for certificate monitoring:
# Certificates expiring within 7 days
certmanager_certificate_expiration_timestamp_seconds - time() < 604800
# Certificate ready status
certmanager_certificate_ready_status{condition="True"} == 0
Alert Example
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: lucid-cert-alerts
spec:
groups:
- name: certificates
rules:
- alert: LucidOperatorCertExpiringSoon
expr: |
certmanager_certificate_expiration_timestamp_seconds{name="lucid-operator-cert"}
- time() < 604800
for: 1h
labels:
severity: warning
annotations:
summary: Lucid Operator certificate expiring soon
description: Certificate expires in {{ $value | humanizeDuration }}
Troubleshooting
Certificate Errors
Symptom: Pods fail to create with webhook errors
Error from server: error when creating "pod.yaml": Internal error occurred:
failed calling webhook "mutate.lucid.computing": x509: certificate signed by unknown authority
Resolution:
1. Verify caBundle in webhook configuration matches the certificate
2. Check cert-manager logs: kubectl logs -n cert-manager -l app=cert-manager
3. Manually update caBundle if using self-signed certificates
Certificate Expired
Symptom: All pod creations fail after certificate expiry
Resolution: 1. For self-signed: Restart the Operator to regenerate certificates 2. For cert-manager: Check Certificate resource status
kubectl describe certificate lucid-operator-cert -n lucid-system
DNS Name Mismatch
Symptom: TLS handshake errors in logs
Resolution: Ensure all DNS names are in the certificate:
- lucid-operator
- lucid-operator.lucid-system
- lucid-operator.lucid-system.svc
- lucid-operator.lucid-system.svc.cluster.local
Validation Webhook
After mutation, the Validating Webhook verifies that injected auditor images are trusted.
Validation Process
flowchart TD
A[Pod Spec Received] --> B{Contains Auditor Containers?}
B -->|No| C[Allow Pod]
B -->|Yes| D[Extract Image Digests]
D --> E{Strict Mode?}
E -->|Yes| F{All Images Have Digests?}
E -->|No| G[Check Notarization]
F -->|No| H[Reject: Missing Digest]
F -->|Yes| G
G --> I{Query Verifier}
I -->|Notarized| C
I -->|Not Notarized| J{Strict Mode?}
J -->|Yes| K[Reject: Not Notarized]
J -->|No| L[Warn and Allow]
Strict Mode
In production (strict_notarization=true):
- All auditor images must include a digest (e.g.,
image@sha256:abc123...) - All images must be registered in the Lucid Verifier's notarization registry
- Pods are rejected if either check fails
In development (strict_notarization=false):
- Images without digests are allowed with a warning
- Unverifiable images are allowed if Verifier is unreachable (fail-open)
Notarization Check
The webhook queries the Verifier service:
GET /registry/check?digest=sha256:abc123...
Response:
{
"digest": "sha256:abc123...",
"notarized": true
}
Auditor Chain Architecture
Injected auditors form a linear processing chain using environment variables:
flowchart LR
A["Application (port 8000)<br/>AUDITOR_URL=localhost:8081"] --> B["Injection (port 8081)<br/>AUDITOR_URL=localhost:8082"]
B --> C["Toxicity (port 8082)<br/>AUDITOR_URL=localhost:5000"]
C --> D["AI Model<br/>(port 5000)"]
Chain Configuration Priority
- Pod Annotation:
lucid.computing/auditor-chain(comma-separated names) - ConfigMap:
lucid-auditor-chainin the namespace - Default:
lucid-guardrails-auditor,lucid-guardrails-auditor
Init Container
The webhook also injects a lucid-init container that sets up iptables rules to redirect traffic through the auditor chain:
iptables -t nat -N LUCID_REDIRECT
iptables -t nat -A LUCID_REDIRECT -p tcp -j REDIRECT --to-port 8081
iptables -t nat -A OUTPUT -p tcp --dport 8000 -j LUCID_REDIRECT
Failure Modes
Webhook Unreachable
If failurePolicy: Fail, pods won't be created. For graceful degradation:
failurePolicy: Ignore # Allow pods without auditors
Warning
Using Ignore means pods may run without audit protection.
Invalid Annotations
Invalid annotations are logged but don't block pod creation:
try:
auditors = parse_annotations(pod)
except ValueError as e:
logger.warning("invalid_annotations", error=str(e))
# Continue without injection
Debugging
Check Webhook Registration
kubectl get mutatingwebhookconfigurations lucid-operator -o yaml
View Webhook Logs
kubectl logs -n lucid-system -l app=lucid-operator | grep webhook
Test Mutation
# Create a test pod
kubectl run test-pod --image=nginx --annotations="lucid.computing/auditors=injection"
# Check if sidecars were injected
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].name}'
Related Documentation
- Troubleshooting Guide — Common issues and solutions
- Architecture Overview — How the Operator fits into Lucid
- Deployment Guide — Deploying with the Operator