Troubleshooting

This section provides guidance for diagnosing and resolving common issues when working with the Lucid platform.

Quick Diagnostics

If you're having trouble with agents or auditors, start with these checks:

lucid statusContext: prod

Agents (1):
NAME STATUS MODEL GPU
my-agent running llama3 H100
lucid status my-agentAgent: my-agent
ID: agent-abc123
Status: running
Model: llama3
GPU: H100
lucid logs my-agent[2024-01-15 10:30:00] Agent started
[2024-01-15 10:30:01] Auditors initialized

Common Issues

Agent Creation Fails

Symptoms: - lucid apply returns an error - Agent stuck in "pending" state

Solutions:

Check authentication:

lucid login -e user@example.com -p mypasswordLogged in as user@example.com

Verify quota availability:
Check that you have available GPU/compute quota in the selected region
Try a different region if resources are constrained
Check auditor images are notarized:

lucid auditor verify my-auditor:v1[+] Basic labels found.
[+] Compliance probe successful!
[*] Verification complete. Auditor is compliant.

Local Environment Issues

Symptoms: - lucid apply fails - Kind cluster not starting

Solutions:

Check prerequisites:

docker --versionkind --versionkubectl version --client

Check cluster status:

lucid statusContext: dev
Cluster: lucid-local-k8s
Status: Running

Teardown and recreate:

lucid teardown -ykind delete cluster --name lucid-local-k8skind create cluster --name lucid-local-k8slucid apply -f my-environment.yaml -y

Auditor Verification Fails

Symptoms: - lucid auditor verify reports errors - Missing labels or endpoints

Solutions:

Ensure required OCI labels:

LABEL io.lucid.auditor="true"
LABEL io.lucid.schema_version="1.0"
LABEL io.lucid.phase="request"
LABEL io.lucid.interfaces="health,audit"

Implement required endpoints:
GET /health - Must return 200 OK with {"status": "ok"}
POST /audit - Main audit endpoint
Check the container runs as non-root:
```
USER 1001
```

Passport Shows "Not Attested"

Symptoms: - AI Passport shows hardware_attested: false - Missing TEE information

Solutions:

Verify agent is running on TEE hardware:

lucid passport show <passport-id>Passport ID: pass-001
Hardware Attested: true
TEE Type: AMD SEV-SNP

Contact support if you expected hardware attestation but it's not present.

CLI Authentication Issues

Symptoms: - 401 Unauthorized errors - Token expired messages

Solutions:

Re-authenticate:

lucid login -e user@example.com -p mypasswordLogged in as user@example.com

Generate a new API key:

lucid login -e user@example.com -p mypassword --generate-keyAPI Key: luc_xxxxxxxxxxxxxxxx

Check credentials file: Credentials are stored in ~/.lucid/config.yaml

Advanced Troubleshooting

Policy Evaluation Errors

Symptoms: - Requests blocked unexpectedly - Policy evaluation returns errors - Claims not matching expected values

Solutions:

Check policy syntax:

# Validate policy file locally
lucid policy validate my-policy.yaml

Enable policy debug logging:

# In your auditor configuration
env:
  LUCID_LOG_LEVEL: debug
  LUCID_POLICY_DEBUG: "true"

Inspect claim values in logs:

lucid logs my-agent | grep "policy_evaluation"

Common policy issues:

Error	Cause	Fix
`KeyError: 'claim_name'`	Claim not produced by auditor	Check auditor produces required claims
`TypeError in condition`	Type mismatch in rule	Ensure claim types match (string vs int)
`PolicyNotFound`	Policy not registered	Run `lucid policy push` to register

Test policy locally:

from lucid_sdk import PolicyEngine, load_policy

policy = load_policy("my-policy.yaml")
engine = PolicyEngine(policy)

# Test with sample claims
test_claims = {"location.country": {"value": "US"}}
result = engine.evaluate(test_claims)
print(f"Decision: {result.decision}")

Multi-Auditor Chain Debugging

Symptoms: - Requests failing at unknown point in chain - Auditors timing out - Chain not executing in expected order

Solutions:

Check chain configuration:
```
lucid status my-agent --verbose
```

Output shows the auditor chain order and ports:

Audit Chain:
  1. injection-auditor (port 8090) → 8093
  2. toxicity-auditor (port 8093) → 5000
  3. model (port 5000)

Enable request tracing:

# In environment YAML
services:
  observability:
    enabled: true
    tracing: true

View per-auditor timing:
```
lucid logs my-agent --filter auditor
```

Look for timing information:

[injection-auditor] Request processed in 45ms
[toxicity-auditor] Request processed in 120ms

Debug individual auditor:

# Test auditor directly (bypass chain)
curl -X POST http://localhost:8090/audit \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "test"}]}'

Common chain issues:

Symptom	Cause	Fix
First auditor works, second fails	Port misconfiguration	Check `LUCID_CHAIN_NEXT` env var
Timeout after specific auditor	Auditor crash/hang	Check auditor logs, increase timeout
Wrong auditor order	YAML ordering	Auditors execute in `chain` list order

Evidence Submission Failures

Symptoms: - Evidence not appearing in Verifier - EvidenceSubmissionError in logs - AI Passport missing claims

Solutions:

Check Verifier connectivity:

# From auditor pod
curl https://verifier.lucid.sh/health

Verify evidence schema:

from lucid_schemas import Evidence

# Validate evidence before submission
evidence = Evidence(
    auditor_id="my-auditor",
    claims=[...],
    # ...
)
evidence.model_dump()  # Raises ValidationError if invalid

Check authentication:

# Verify API key is set
echo $LUCID_API_KEY

Common evidence errors:

Error	Cause	Fix
`ValidationError: claims`	Invalid claim format	Use `lucid_schemas.Claim` type
`401 Unauthorized`	Missing/invalid API key	Set `LUCID_API_KEY` env var
`413 Payload Too Large`	Evidence too large	Reduce claim data size
`409 Conflict`	Duplicate evidence ID	Ensure unique `evidence_id`

Enable submission logging:
```
env:
  LUCID_EVIDENCE_DEBUG: "true"
```

Attestation Verification Failures

Symptoms: - lucid verify fails - Attestation report invalid - Hardware quote verification error

Solutions:

Check attestation status:

lucid verify environment env-abc123 --verbose

Verify hardware support:

# Check if running on TEE-capable hardware
lucid status my-agent --hardware

Expected output for valid TEE:

Hardware: AMD EPYC (SEV-SNP enabled)
TEE Status: Active
Quote Valid: true

Common attestation errors:

Error	Cause	Fix
`QuoteVerificationFailed`	Invalid hardware quote	Ensure TEE hardware is genuine
`MeasurementMismatch`	Code tampered	Redeploy from verified image
`CertificateExpired`	Old attestation cert	Request fresh attestation
`NetworkError`	Can't reach Intel/AMD	Check firewall allows attestation endpoints

Mock mode issues:

If using TEE_PROVIDER=MOCK (local dev environment), attestation will show hardware_attested: false. This is expected for local development.

Refresh attestation:

# Force new attestation report
lucid verify environment env-abc123 --refresh

Operator Webhook Issues

Symptoms: - Pods not getting auditor sidecars - Webhook timeout errors - Certificate errors

Solutions:

Check webhook registration:

kubectl get mutatingwebhookconfigurations lucid-operator -o yaml

Verify namespace label:

# Namespace must have this label for injection
kubectl get namespace my-namespace --show-labels | grep lucid

Required label:

labels:
  lucid.computing/enabled: "true"

Check operator logs:

kubectl logs -n lucid-system -l app=lucid-operator | grep webhook

Certificate issues:

# Check certificate validity
kubectl get secret lucid-operator-tls -n lucid-system -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

See Operator Webhook for detailed webhook documentation.

Health Check Endpoints

All Lucid services expose health endpoints that you can use for debugging:

Service	Health Endpoint
Verifier API	`https://verifier.lucid.sh/health`
Observer UI	`https://observer.lucid.sh/api/health`
Auditors	`/health` on configured port

Getting Help

If you can't resolve an issue:

Check the GitHub Issues
Search existing discussions
Contact support at support@lucid.sh with:
Error messages (redact sensitive data)
Steps to reproduce
Agent ID and passport IDs if applicable