Skip to content

Troubleshooting

This section provides guidance for diagnosing and resolving common issues when working with the Lucid platform.

Quick Diagnostics

If you're having trouble with agents or auditors, start with these checks:

lucid statusContext: prod

Agents (1):
NAME STATUS MODEL GPU
my-agent running llama3 H100
lucid status my-agentAgent: my-agent
ID: agent-abc123
Status: running
Model: llama3
GPU: H100
lucid logs my-agent[2024-01-15 10:30:00] Agent started
[2024-01-15 10:30:01] Auditors initialized

Common Issues

Agent Creation Fails

Symptoms: - lucid apply returns an error - Agent stuck in "pending" state

Solutions:

  1. Check authentication:
lucid login -e user@example.com -p mypasswordLogged in as user@example.com
  1. Verify quota availability:
  2. Check that you have available GPU/compute quota in the selected region
  3. Try a different region if resources are constrained

  4. Check auditor images are notarized:

lucid auditor verify my-auditor:v1[+] Basic labels found.
[+] Compliance probe successful!
[*] Verification complete. Auditor is compliant.

Local Environment Issues

Symptoms: - lucid apply fails - Kind cluster not starting

Solutions:

  1. Check prerequisites:
docker --versionkind --versionkubectl version --client
  1. Check cluster status:
lucid statusContext: dev
Cluster: lucid-local-k8s
Status: Running
  1. Teardown and recreate:
lucid teardown -ykind delete cluster --name lucid-local-k8skind create cluster --name lucid-local-k8slucid apply -f my-environment.yaml -y

Auditor Verification Fails

Symptoms: - lucid auditor verify reports errors - Missing labels or endpoints

Solutions:

  1. Ensure required OCI labels:

    LABEL io.lucid.auditor="true"
    LABEL io.lucid.schema_version="1.0"
    LABEL io.lucid.phase="request"
    LABEL io.lucid.interfaces="health,audit"
    

  2. Implement required endpoints:

  3. GET /health - Must return 200 OK with {"status": "ok"}
  4. POST /audit - Main audit endpoint

  5. Check the container runs as non-root:

    USER 1001
    

Passport Shows "Not Attested"

Symptoms: - AI Passport shows hardware_attested: false - Missing TEE information

Solutions:

  1. Verify agent is running on TEE hardware:
lucid passport show <passport-id>Passport ID: pass-001
Hardware Attested: true
TEE Type: AMD SEV-SNP
  1. Contact support if you expected hardware attestation but it's not present.

CLI Authentication Issues

Symptoms: - 401 Unauthorized errors - Token expired messages

Solutions:

  1. Re-authenticate:
lucid login -e user@example.com -p mypasswordLogged in as user@example.com
  1. Generate a new API key:
lucid login -e user@example.com -p mypassword --generate-keyAPI Key: luc_xxxxxxxxxxxxxxxx
  1. Check credentials file: Credentials are stored in ~/.lucid/config.yaml

Advanced Troubleshooting

Policy Evaluation Errors

Symptoms: - Requests blocked unexpectedly - Policy evaluation returns errors - Claims not matching expected values

Solutions:

  1. Check policy syntax:

    # Validate policy file locally
    lucid policy validate my-policy.yaml
    

  2. Enable policy debug logging:

    # In your auditor configuration
    env:
      LUCID_LOG_LEVEL: debug
      LUCID_POLICY_DEBUG: "true"
    

  3. Inspect claim values in logs:

    lucid logs my-agent | grep "policy_evaluation"
    

  4. Common policy issues:

Error Cause Fix
KeyError: 'claim_name' Claim not produced by auditor Check auditor produces required claims
TypeError in condition Type mismatch in rule Ensure claim types match (string vs int)
PolicyNotFound Policy not registered Run lucid policy push to register
  1. Test policy locally:
    from lucid_sdk import PolicyEngine, load_policy
    
    policy = load_policy("my-policy.yaml")
    engine = PolicyEngine(policy)
    
    # Test with sample claims
    test_claims = {"location.country": {"value": "US"}}
    result = engine.evaluate(test_claims)
    print(f"Decision: {result.decision}")
    

Multi-Auditor Chain Debugging

Symptoms: - Requests failing at unknown point in chain - Auditors timing out - Chain not executing in expected order

Solutions:

  1. Check chain configuration:
    lucid status my-agent --verbose
    

Output shows the auditor chain order and ports:

Audit Chain:
  1. injection-auditor (port 8090) → 8093
  2. toxicity-auditor (port 8093) → 5000
  3. model (port 5000)

  1. Enable request tracing:

    # In environment YAML
    services:
      observability:
        enabled: true
        tracing: true
    

  2. View per-auditor timing:

    lucid logs my-agent --filter auditor
    

Look for timing information:

[injection-auditor] Request processed in 45ms
[toxicity-auditor] Request processed in 120ms

  1. Debug individual auditor:

    # Test auditor directly (bypass chain)
    curl -X POST http://localhost:8090/audit \
      -H "Content-Type: application/json" \
      -d '{"messages": [{"role": "user", "content": "test"}]}'
    

  2. Common chain issues:

Symptom Cause Fix
First auditor works, second fails Port misconfiguration Check LUCID_CHAIN_NEXT env var
Timeout after specific auditor Auditor crash/hang Check auditor logs, increase timeout
Wrong auditor order YAML ordering Auditors execute in chain list order

Evidence Submission Failures

Symptoms: - Evidence not appearing in Verifier - EvidenceSubmissionError in logs - AI Passport missing claims

Solutions:

  1. Check Verifier connectivity:

    # From auditor pod
    curl https://verifier.lucid.sh/health
    

  2. Verify evidence schema:

    from lucid_schemas import Evidence
    
    # Validate evidence before submission
    evidence = Evidence(
        auditor_id="my-auditor",
        claims=[...],
        # ...
    )
    evidence.model_dump()  # Raises ValidationError if invalid
    

  3. Check authentication:

    # Verify API key is set
    echo $LUCID_API_KEY
    

  4. Common evidence errors:

Error Cause Fix
ValidationError: claims Invalid claim format Use lucid_schemas.Claim type
401 Unauthorized Missing/invalid API key Set LUCID_API_KEY env var
413 Payload Too Large Evidence too large Reduce claim data size
409 Conflict Duplicate evidence ID Ensure unique evidence_id
  1. Enable submission logging:
    env:
      LUCID_EVIDENCE_DEBUG: "true"
    

Attestation Verification Failures

Symptoms: - lucid verify fails - Attestation report invalid - Hardware quote verification error

Solutions:

  1. Check attestation status:

    lucid verify environment env-abc123 --verbose
    

  2. Verify hardware support:

    # Check if running on TEE-capable hardware
    lucid status my-agent --hardware
    

Expected output for valid TEE:

Hardware: AMD EPYC (SEV-SNP enabled)
TEE Status: Active
Quote Valid: true

  1. Common attestation errors:
Error Cause Fix
QuoteVerificationFailed Invalid hardware quote Ensure TEE hardware is genuine
MeasurementMismatch Code tampered Redeploy from verified image
CertificateExpired Old attestation cert Request fresh attestation
NetworkError Can't reach Intel/AMD Check firewall allows attestation endpoints
  1. Mock mode issues:

If using TEE_PROVIDER=MOCK (local dev environment), attestation will show hardware_attested: false. This is expected for local development.

  1. Refresh attestation:
    # Force new attestation report
    lucid verify environment env-abc123 --refresh
    

Operator Webhook Issues

Symptoms: - Pods not getting auditor sidecars - Webhook timeout errors - Certificate errors

Solutions:

  1. Check webhook registration:

    kubectl get mutatingwebhookconfigurations lucid-operator -o yaml
    

  2. Verify namespace label:

    # Namespace must have this label for injection
    kubectl get namespace my-namespace --show-labels | grep lucid
    

Required label:

labels:
  lucid.computing/enabled: "true"

  1. Check operator logs:

    kubectl logs -n lucid-system -l app=lucid-operator | grep webhook
    

  2. Certificate issues:

    # Check certificate validity
    kubectl get secret lucid-operator-tls -n lucid-system -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates
    

See Operator Webhook for detailed webhook documentation.


Health Check Endpoints

All Lucid services expose health endpoints that you can use for debugging:

Service Health Endpoint
Verifier API https://verifier.lucid.sh/health
Observer UI https://observer.lucid.sh/api/health
Auditors /health on configured port

Getting Help

If you can't resolve an issue:

  1. Check the GitHub Issues
  2. Search existing discussions
  3. Contact support at support@lucid.sh with:
  4. Error messages (redact sensitive data)
  5. Steps to reproduce
  6. Agent ID and passport IDs if applicable