Record
One decorator captures every LLM call, tool call, and decision your agent makes.
@retrace.record()Reliability & observability for AI agents
Retrace records every LLM call, tool call, and decision your agent makes — so you can replay any run, fork from the exact step it broke, and verify the fix before you ship. Plus guardrails that stop runaway agents in production.
// the core loop
From a single decorator to a verdict — the reliability loop Retrace owns, end to end.
One decorator captures every LLM call, tool call, and decision your agent makes.
@retrace.record()Re-run any recorded run, or fork from the exact step it broke and watch the agent diverge.
Automatic failure detection — groundedness, drift, and failure clustering — with runs auto-classified by failure type (MAST).
Guardrails and circuit breakers halt runaway loops and budget blow-outs in production — before damage cascades.
Re-run a change against the failed run and get a verdict — did it actually fix it?
↻ every run feeds the next — record, fix, repeat.
// see it for real
The actual trace UI — a recorded run that failed on an ungrounded answer, then forked from that exact step and passed. Scrub the timeline, open any span, watch the fork diverge.
8 spans · 1 error at step 5 · forked at step 5 · fix verified
// how it's different
Aggregate metrics & dashboards
Replay the exact run — fork from the failed step and cascade-re-execute
Search through raw JSON spans
Semantic search — describe the bug in natural language (pgvector)
No way to test a fix without re-running everything
Prove-the-fix — re-run a change against the failed run, get a verdict
Static alert thresholds
Adaptive guardrails + circuit breakers that halt runaway loops & budget
Post-mortem analysis only
Runtime enforcement — stop agents in production, not after
Tied to one framework
One decorator — any Python/TS agent (LangChain · CrewAI · LlamaIndex)
// the platform
Detect failures, enforce limits, evaluate quality, and understand multi-agent systems — on every recorded run.
Automatic failure detection — groundedness via cosine similarity + an LLM faithfulness judge (tiered cheap→deep), statistical drift, failure clustering, and MAST classification of failure types.
Stop runaway agents in production — guardrails and circuit breakers on budget, loops, and steps, fronted by a pre-call enforcement gateway with hold-for-approval.
Quality you can gate on — evaluations, auto eval-rules, CI gates that block bad deploys, datasets, and prove-the-fix verdicts.
See the whole system — multi-agent sessions and agent topology, agent memory, semantic search, prompt versioning, and shareable tapes.
Automatic failure detection — groundedness via cosine similarity + an LLM faithfulness judge (tiered cheap→deep), statistical drift, failure clustering, and MAST classification of failure types.
Stop runaway agents in production — guardrails and circuit breakers on budget, loops, and steps, fronted by a pre-call enforcement gateway with hold-for-approval.
Quality you can gate on — evaluations, auto eval-rules, CI gates that block bad deploys, datasets, and prove-the-fix verdicts.
See the whole system — multi-agent sessions and agent topology, agent memory, semantic search, prompt versioning, and shareable tapes.
// how it works
Sign in with GitHub, copy ~3 lines, run your agent — the first trace streams in live. No infrastructure to manage.
Install the SDK and add one decorator. Calls to OpenAI, Anthropic and Gemini are captured automatically.
Framework-agnostic — works with LangChain, CrewAI, and LlamaIndex.
import retraceretrace.configure(api_key="rt_live_...")@retrace.record(name="my-agent")def run_agent(prompt): return agent.invoke(prompt)Run your agent. Every LLM call, tool call, cost and error streams onto the timeline as it happens.
Or watch it replay step-by-step in the dashboard.
retrace traces tailFork from the exact step that broke, change the input, re-run — then prove the fix actually worked.
verify-fix returns a verdict — improved, regressed, or unchanged.
retrace forks create --trace <id> --span <id> --input "grounded prompt"retrace forks replay <id> --waitretrace traces verify-fix <id>// pricing
No credit card required. Upgrade when you need more traces or AI requests.
For experimenting
For solo builders
For shipping
For teams
For scale
// questions
Under 2 minutes. Install the SDK, add one decorator, and traces stream immediately. No infrastructure to manage.
Python and TypeScript SDKs with auto-instrumentation for OpenAI, Anthropic, and Google Gemini. Works with any agent framework — LangChain, CrewAI, Vercel AI SDK, AutoGen, LlamaIndex.
Select any span in a trace, modify its input, and Retrace cascade-replays from that point forward. Context from the fork flows into subsequent LLM calls. You get a side-by-side diff with cost and latency deltas.
Runtime policies that monitor your agent in real-time. Set cost budgets, loop detection, context overflow limits, or latency caps. When violated, the agent receives a HALT command — stopping it before damage cascades.
TLS in transit, encrypted at rest. API keys are SHA-256 hashed. PII auto-redaction runs on every plan as a security baseline. Tenant isolation is enforced at the application layer — every query is scoped per user and backed by a guardrail regression test.
Yes. The eval gate endpoint (POST /evaluations/:id/gate) returns pass/fail against a threshold. The CLI command `retrace eval gate` exits with code 1 on failure — perfect for GitHub Actions.
LangSmith focuses on tracing and observability. Retrace adds interactive fork & cascade-replay from any step, runtime guardrails that halt runaway agents, groundedness detection, and prove-the-fix verification.
Yes. Each span carries an agent id, sessions group multi-turn conversations, and an agent topology graph shows cross-agent ordering and inter-agent failure modes.
Your agent failed 3 steps before the error surfaced. Fork from the real cause — not the symptom.
No credit card · 2-min setup