Error Detection
Retrace turns recorded agent runs into detected, persisted, replay-verified failures — automatically.
Error Detection
Recording an agent is step one. Retrace also detects what went wrong — automatically, on every trace — and persists each finding so you get history, alerting, and a triage queue instead of a wall of logs. Detection runs identically whether spans arrive over WebSocket, HTTP, or OpenTelemetry.
Everything below is live today. Browse findings at /detections, or inline on any trace's
timeline.
What gets detected
| Detector | Failure mode | How it works | Cost |
|---|---|---|---|
| Hallucination (tiered) | hallucination | Grounding + mutual-information scoring of model output | hot-path + Tier-3 worker |
| Tool-output hallucination | hallucination | Compares the model's claim about a tool result to the verbatim recorded result | sampled LLM judge |
| Schema validation | schema_violation | Validates tool-call args against the declared tool schema, and structured outputs against the learned schema | deterministic, every trace |
| Loop / non-progress | loop | Identical tool-call hash repeated N times, step-count outliers, reasoning stalls | deterministic, every trace |
| Goal drift | goal_drift | LLM judge over the full conversation: did the agent stay on the original objective? | sampled LLM judge |
| Context loss | context_loss | Detects dropped turn-1 constraints past long conversations | sampled LLM judge |
| Replay divergence | divergence | Re-executes the run and structurally diffs (tool graph, retrieved docs, sampling config) | on demand |
| Regression (golden) | regression | Replays a golden trace against a new prompt/model and flags structural regressions | on demand / CI |
| Root-cause chain | root_cause | Walks the span dependency graph backward to the earliest corrupted step | on demand |
| Distribution drift | drift | Scheduled MMD drift vs a rolling baseline; auto-pivots a re-cluster | scheduled worker |
| Probabilistic anomaly | anomaly | Heavy-hitter tool loops, cardinality drift, duplicate spans | hot-path |
| Guardrail violation | guardrail_violation | Live policy breaches (cost / loop / context / latency / error-rate) | hot-path |
Severity & dedup
Every detector funnels through one durable write path. One logical failure = one detection row
— a 2,000-iteration loop is a single loop detection whose count updates in place, not 2,000 rows.
Severity scale: critical › high › medium › low › info. New high/critical detections
email the account owner (deduped per trace/detector per month).
The detections feed
# List (tenant-scoped) — filter by trace, detector, failure_mode, severity, status, date
curl "https://api.retraceai.tech/api/v1/detections?failure_mode=loop&severity=high" \
-H "x-retrace-key: rt_live_..."
# Aggregate counts for dashboards
curl https://api.retraceai.tech/api/v1/detections/summary -H "x-retrace-key: rt_live_..."
# Triage
curl -X PATCH https://api.retraceai.tech/api/v1/detections/<id> \
-H "x-retrace-key: rt_live_..." -d '{"status":"resolved"}'Verify replay & divergence (2A)
Re-execute a trace and structurally diff it against the recording. Divergence is ranked by the first divergent step, not raw character diff.
curl -X POST https://api.retraceai.tech/api/v1/traces/<id>/verify-replay \
-H "x-retrace-key: rt_live_..."In the web app, hit Verify Replay on any trace to see the divergence score and jump to the first divergent step.
Root-cause chain (2B)
curl -X POST https://api.retraceai.tech/api/v1/traces/<id>/root-cause-chain \
-H "x-retrace-key: rt_live_..."Returns the causal chain from the failure back to the earliest corrupted step (empty output, error, or corrupted tool result) — distinct from the LLM "explain failure" summary.
Regression replay & golden traces (2E)
Mark known-good traces golden, then replay them against a new prompt or model and assert structural equivalence. Regressions fail the GitHub Action gate.
import retrace
retrace.mark_golden(trace_id) # Pythonimport { markGolden } from "retrace-sdk";
await markGolden(traceId); // TypeScript# Replay the whole golden set against current code (CI)
curl -X POST https://api.retraceai.tech/api/v1/golden-set/regression-replay \
-H "x-retrace-key: rt_live_..."Sharing a detected failure
From a fired detection, one click publishes an unlisted link (/t/<slug>) that surfaces the
detection inline on the timeline — drop it into a GitHub issue, Discord, or Slack to get help. See
Sharing & forking.