Skip to content

Operator Webhook Flow

The Lucid Operator uses a Kubernetes Mutating Admission Webhook to inject auditor sidecars into AI workload pods. This document explains the webhook flow in detail.

Overview

The Operator implements two webhooks:

  1. Mutating Webhook (/mutate): Injects auditor sidecars into pods
  2. Validating Webhook (/validate): Verifies auditor images are notarized
sequenceDiagram
    participant User
    participant API as K8s API Server
    participant MutateWH as Mutating Webhook
    participant ValidateWH as Validating Webhook
    participant Verifier as Lucid Verifier
    participant Scheduler as K8s Scheduler

    User->>API: Create Pod (with lucid.io/secured=true)
    API->>MutateWH: MutatingAdmissionReview
    MutateWH->>MutateWH: Check labels
    MutateWH->>MutateWH: Inject sidecar containers
    MutateWH->>API: Return mutated Pod spec

    API->>ValidateWH: ValidatingAdmissionReview
    ValidateWH->>Verifier: Check image notarization
    Verifier-->>ValidateWH: Notarization status
    ValidateWH->>API: Allow/Deny

    API->>Scheduler: Schedule Pod (if allowed)
    Scheduler->>Node: Start Pod with sidecars

Injection Trigger

Pods are injected with auditor sidecars when they have the label:

metadata:
  labels:
    lucid.io/secured: "true"

This unified label replaces the previous annotation-based approach and provides a cleaner, more Kubernetes-native experience.

Webhook Registration

The Operator registers a MutatingWebhookConfiguration on startup:

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: lucid-operator
webhooks:
  - name: mutate.lucid.computing
    clientConfig:
      service:
        name: lucid-operator-webhook
        namespace: lucid-system
        path: /mutate
      caBundle: <base64-encoded-ca>
    rules:
      - operations: ["CREATE"]
        apiGroups: [""]
        apiVersions: ["v1"]
        resources: ["pods"]
    namespaceSelector:
      matchLabels:
        lucid.computing/enabled: "true"
    failurePolicy: Fail

Mutation Process

Step 1: Request Interception

When a pod is created in a labeled namespace, the API server sends an AdmissionReview to the webhook:

{
  "kind": "AdmissionReview",
  "request": {
    "uid": "abc-123",
    "kind": {"group": "", "version": "v1", "kind": "Pod"},
    "operation": "CREATE",
    "object": {
      "metadata": {
        "annotations": {
          "lucid.computing/auditors": "injection,toxicity"
        }
      },
      "spec": {
        "containers": [...]
      }
    }
  }
}

Step 2: Annotation Parsing

The webhook extracts auditor configuration from annotations:

Annotation Description Example
lucid.computing/auditors Auditors to inject injection,toxicity
lucid.computing/audit-mode Mode (enforce/observe) enforce
lucid.computing/skip-audit Skip injection false

Step 3: Sidecar Injection

For each requested auditor, the webhook adds a sidecar container:

containers:
  - name: lucid-guardrails-auditor
    image: ghcr.io/lucid/lucid-guardrails-auditor:latest
    ports:
      - containerPort: 8090
    env:
      - name: LUCID_AUDITOR_ID
        value: "lucid-guardrails-auditor"
      - name: LUCID_CHAIN_NEXT
        value: "http://localhost:8093"  # Next auditor in chain
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"
        cpu: "500m"

Step 4: Chain Configuration

The webhook configures the auditor chain by setting environment variables:

flowchart LR
    A["Injection (8090)<br/>CHAIN_NEXT=8093"] --> B["Toxicity (8093)<br/>CHAIN_NEXT=5000"]
    B --> C["AI Model<br/>(5000)"]

Step 5: Response

The webhook returns a JSON Patch to mutate the pod:

{
  "kind": "AdmissionReview",
  "response": {
    "uid": "abc-123",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "W3sib3AiOiJhZGQiLCJwYXRoIjoiL3NwZWMvY29udGFpbmVycyIsInZhbHVlIjpbLi4uXX1d"
  }
}

TLS Certificate Management

The webhook requires TLS for secure communication with the API server. Proper certificate management is critical for production deployments.

Self-Signed Certificates (Development)

The Operator auto-generates certificates on startup using the cryptography library:

# Simplified certificate generation (see cluster.py for full implementation)
from cryptography.x509 import CertificateBuilder, SubjectAlternativeName, DNSName
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import rsa

# Generate key pair
private_key = rsa.generate_private_key(public_exponent=65537, key_size=2048)

# Build certificate with SANs
cert = (
    CertificateBuilder()
    .subject_name(x509.Name([x509.NameAttribute(NameOID.COMMON_NAME, "lucid-operator")]))
    .add_extension(
        SubjectAlternativeName([
            DNSName("lucid-operator.lucid-system.svc"),
            DNSName("lucid-operator.lucid-system.svc.cluster.local"),
            DNSName("lucid-operator"),
        ]),
        critical=False,
    )
    .sign(private_key, hashes.SHA256())
)

The certificate and key are stored in a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
  name: lucid-operator-tls
  namespace: lucid-system
type: kubernetes.io/tls
data:
  tls.crt: <base64-encoded-certificate>
  tls.key: <base64-encoded-private-key>

cert-manager (Production)

For production deployments, use cert-manager for automated certificate lifecycle management.

Prerequisites

  1. Install cert-manager in your cluster:

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
    

  2. Create a ClusterIssuer (self-signed for internal use):

    apiVersion: cert-manager.io/v1
    kind: ClusterIssuer
    metadata:
      name: lucid-selfsigned-issuer
    spec:
      selfSigned: {}
    

Or use an internal CA:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: lucid-ca-issuer
spec:
  ca:
    secretName: lucid-ca-key-pair

Certificate Resource

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: lucid-operator-cert
  namespace: lucid-system
spec:
  secretName: lucid-operator-tls
  duration: 8760h    # 1 year
  renewBefore: 720h  # 30 days before expiry
  issuerRef:
    name: lucid-selfsigned-issuer
    kind: ClusterIssuer
  commonName: lucid-operator
  dnsNames:
    - lucid-operator
    - lucid-operator.lucid-system
    - lucid-operator.lucid-system.svc
    - lucid-operator.lucid-system.svc.cluster.local
  usages:
    - server auth
    - digital signature
    - key encipherment

Webhook Configuration with cert-manager

Use cert-manager's CA injector to automatically update the webhook's caBundle:

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: lucid-operator-webhook
  annotations:
    cert-manager.io/inject-ca-from: lucid-system/lucid-operator-cert
webhooks:
  - name: mutate.lucid.computing
    clientConfig:
      service:
        name: lucid-operator
        namespace: lucid-system
        path: /mutate
      # caBundle is auto-injected by cert-manager
    # ... rest of config

Certificate Rotation

Self-Signed (Development)

The Operator monitors certificate expiration and regenerates before expiry:

  1. Certificate validity: 365 days by default
  2. Renewal check: Every 24 hours
  3. Renewal trigger: 30 days before expiry

cert-manager (Production)

cert-manager handles rotation automatically:

  1. Monitors certificate expiration
  2. Renews based on renewBefore setting
  3. Updates the Secret with new certificate
  4. CA injector updates webhook caBundle

Important: Ensure the Operator pod restarts or reloads the certificate when it changes. Options:

  1. Reloader: Use stakater/Reloader to auto-restart pods:

    metadata:
      annotations:
        secret.reloader.stakater.com/reload: "lucid-operator-tls"
    

  2. File watch: The Operator watches the mounted certificate files for changes

Production Checklist

Before deploying to production, verify:

  • cert-manager installed and ClusterIssuer configured
  • Certificate resource created with appropriate DNS names
  • CA injection annotation added to webhook configurations
  • Certificate renewal tested (shorten duration temporarily)
  • Pod restart mechanism configured for certificate rotation
  • Monitoring set up for certificate expiration (Prometheus alerts)

Monitoring Certificates

Prometheus Metrics

cert-manager exposes metrics for certificate monitoring:

# Certificates expiring within 7 days
certmanager_certificate_expiration_timestamp_seconds - time() < 604800

# Certificate ready status
certmanager_certificate_ready_status{condition="True"} == 0

Alert Example

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: lucid-cert-alerts
spec:
  groups:
    - name: certificates
      rules:
        - alert: LucidOperatorCertExpiringSoon
          expr: |
            certmanager_certificate_expiration_timestamp_seconds{name="lucid-operator-cert"}
            - time() < 604800
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: Lucid Operator certificate expiring soon
            description: Certificate expires in {{ $value | humanizeDuration }}

Troubleshooting

Certificate Errors

Symptom: Pods fail to create with webhook errors

Error from server: error when creating "pod.yaml": Internal error occurred:
failed calling webhook "mutate.lucid.computing": x509: certificate signed by unknown authority

Resolution: 1. Verify caBundle in webhook configuration matches the certificate 2. Check cert-manager logs: kubectl logs -n cert-manager -l app=cert-manager 3. Manually update caBundle if using self-signed certificates

Certificate Expired

Symptom: All pod creations fail after certificate expiry

Resolution: 1. For self-signed: Restart the Operator to regenerate certificates 2. For cert-manager: Check Certificate resource status

kubectl describe certificate lucid-operator-cert -n lucid-system

DNS Name Mismatch

Symptom: TLS handshake errors in logs

Resolution: Ensure all DNS names are in the certificate: - lucid-operator - lucid-operator.lucid-system - lucid-operator.lucid-system.svc - lucid-operator.lucid-system.svc.cluster.local

Validation Webhook

After mutation, the Validating Webhook verifies that injected auditor images are trusted.

Validation Process

flowchart TD
    A[Pod Spec Received] --> B{Contains Auditor Containers?}
    B -->|No| C[Allow Pod]
    B -->|Yes| D[Extract Image Digests]
    D --> E{Strict Mode?}
    E -->|Yes| F{All Images Have Digests?}
    E -->|No| G[Check Notarization]
    F -->|No| H[Reject: Missing Digest]
    F -->|Yes| G
    G --> I{Query Verifier}
    I -->|Notarized| C
    I -->|Not Notarized| J{Strict Mode?}
    J -->|Yes| K[Reject: Not Notarized]
    J -->|No| L[Warn and Allow]

Strict Mode

In production (strict_notarization=true):

  • All auditor images must include a digest (e.g., image@sha256:abc123...)
  • All images must be registered in the Lucid Verifier's notarization registry
  • Pods are rejected if either check fails

In development (strict_notarization=false):

  • Images without digests are allowed with a warning
  • Unverifiable images are allowed if Verifier is unreachable (fail-open)

Notarization Check

The webhook queries the Verifier service:

GET /registry/check?digest=sha256:abc123...

Response:

{
  "digest": "sha256:abc123...",
  "notarized": true
}

Auditor Chain Architecture

Injected auditors form a linear processing chain using environment variables:

flowchart LR
    A["Application (port 8000)<br/>AUDITOR_URL=localhost:8081"] --> B["Injection (port 8081)<br/>AUDITOR_URL=localhost:8082"]
    B --> C["Toxicity (port 8082)<br/>AUDITOR_URL=localhost:5000"]
    C --> D["AI Model<br/>(port 5000)"]

Chain Configuration Priority

  1. Pod Annotation: lucid.computing/auditor-chain (comma-separated names)
  2. ConfigMap: lucid-auditor-chain in the namespace
  3. Default: lucid-guardrails-auditor, lucid-guardrails-auditor

Init Container

The webhook also injects a lucid-init container that sets up iptables rules to redirect traffic through the auditor chain:

iptables -t nat -N LUCID_REDIRECT
iptables -t nat -A LUCID_REDIRECT -p tcp -j REDIRECT --to-port 8081
iptables -t nat -A OUTPUT -p tcp --dport 8000 -j LUCID_REDIRECT

Failure Modes

Webhook Unreachable

If failurePolicy: Fail, pods won't be created. For graceful degradation:

failurePolicy: Ignore  # Allow pods without auditors

Warning

Using Ignore means pods may run without audit protection.

Invalid Annotations

Invalid annotations are logged but don't block pod creation:

try:
    auditors = parse_annotations(pod)
except ValueError as e:
    logger.warning("invalid_annotations", error=str(e))
    # Continue without injection

Debugging

Check Webhook Registration

kubectl get mutatingwebhookconfigurations lucid-operator -o yaml

View Webhook Logs

kubectl logs -n lucid-system -l app=lucid-operator | grep webhook

Test Mutation

# Create a test pod
kubectl run test-pod --image=nginx --annotations="lucid.computing/auditors=injection"

# Check if sidecars were injected
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].name}'