// changelog

What's new.

1.24.1

Enforcement & gateway hardening

  • +Composite AND policies now short-circuit on tool-pattern scope — an out-of-scope call no longer depletes the policy's token/USD/step budget, so a later in-scope call trips exactly when its own usage warrants
  • +USD budget counters pipeline their INCRBYFLOAT + EXPIRE atomically (parity with the integer counters) — no window where a USD key could be left without a TTL
  • +The enforcement gateway returns a generic 502 on an unreachable provider (the upstream error detail is logged server-side, not echoed to the caller)
  • +Docs: clarified that gateway streaming calls enforce token budgets pre-call (via max_tokens) only, and that gateway trace_created is metered after the provider responds
1.24.0

Onboarding & growth

  • +Guided quickstart — the empty dashboard now walks you from API key → install → env var → first trace, with copy-paste Python/TypeScript snippets and a live "waiting for your first trace…" state that flips the moment one arrives
  • +One-click demo workspace — load sample traces, a detection, and a tape into your own project to explore every feature before shipping a span; clearly badged, one-click removable, and never counted toward your usage
  • +Trending tapes — a public /tapes/trending gallery (cached, public-only) plus "forked N times" social proof on shared tapes; fork any public tape to replay it in your own account
  • +Clearer empty states across the app explaining what each feature does and the one action to populate it
  • +Activation funnel — first-party milestone tracking (signup → SDK installed → first trace → detection → replay → tape shared) with an internal admin view; no third-party analytics
1.23.0

Starter plan + pricing refresh

  • +New Starter plan ($29/mo · $290/yr) — 10,000 traces/mo, 30-day retention, 100 fork replays, 25 prove-the-fix runs, and Cassette VCR deterministic replay. The cheapest way to get the replay workflow
  • +Enterprise is now a self-serve annual plan ($24,000/yr · billed annually) — no monthly Enterprise product
  • +Pricing, landing, and billing pages now render every plan from a single shared source of truth, so quotas shown always match what the API enforces
  • +Existing Pro/Teams subscribers are grandfathered — invoices always reflect the amount actually charged, regardless of list-price changes for new customers
  • +Billing fix: every Dodo product now maps to the correct plan (a Starter purchase grants Starter, never Pro); unmapped products alert the team instead of silently defaulting
1.22.0

Reliability Platform

  • +Enforcement (circuit breakers) — budget ceilings, loop breakers, debounce windows, and tool-pattern action policies that block or hold an agent action BEFORE it runs (not just flag it after). Server-side decision engine on atomic, fail-closed Valkey counters, a low-latency /enforcement/check pre-call gate, policy versioning + a hold-for-approval queue, and a new Enforcement page
  • +SDK pre-call gate (Python + TypeScript) — local step/token/USD-per-run ceilings enforced offline with zero network, optional server-policy consult, and a typed RetraceEnforcementError that stops the run instead of silently skipping the call
  • +MAST failure taxonomy — failed traces are auto-classified into the 14 MAST failure modes (an LLM judge metered separately from your AI quota, idempotent per trace), surfaced as detections and an Insights → Failure Taxonomy breakdown
  • +Prove the fix — re-run a trace (fork → replay → first-divergence diff → judge verdict) to confirm a fix improved/regressed/unchanged; hypothesis testing via typed substitutions (swap model/prompt/tool-output), a dashboard Verify-fix button, a `retrace traces verify-fix` CLI gate, and an AI-widget tool
  • +Multi-agent traces — record which agent produced each span (Python `retrace.agent(...)`, TypeScript `withAgent`, OTel gen_ai semconv), see the agent topology graph on multi-agent traces, and catch agent ping-pong loops (all plans) plus reasoning-action mismatch (FM-2.6) and task derailment (FM-2.3) detectors (Pro+)
1.0.0

Advanced AI Agent Analytics Engine

  • +Deep Fork Replay — context injection flows fork output into subsequent prompts, tool output mocking, batch sweeps with multiple variants
  • +Runtime Guardrails — cost budgets, loop detection, context overflow, latency budgets with halt/alert/throttle actions enforced in real-time via WebSocket
  • +Sessions — group multi-turn conversations by session_id, execution DAG graph endpoints with causal ordering
  • +Eval CI/CD Gate — POST /evaluations/:id/gate returns pass/fail with regression detection against baselines
  • +Multi-Agent Tracing — agent_id on spans, vector clocks for causal ordering, cross-agent interaction graphs
  • +Hallucination Detection — tiered pipeline using KL divergence, grounding scores, and entropy analysis on every LLM call
  • +Critical Path Analysis — Brandes betweenness centrality identifies decision bottlenecks in trace DAGs
  • +Adaptive Guardrails — LinUCB contextual bandit learns optimal thresholds per trace-type from trigger history
  • +Trace Similarity — Sinkhorn optimal transport (Wasserstein distance) finds similar failures across all traces
  • +Causal Inference — average causal effect estimation via embedding flow on span DAGs
  • +LTL Safety Verification — formal property checking (always/never/eventually/precedes) on traces
  • +Spectral Anomaly Detection — Laplacian eigenvalues on agent interaction graphs detect communication pattern anomalies
  • +Fork Preferences & RLHF Export — record preferred fork outcomes, export as DPO training pairs
  • +Counterfactual Surrogate — predict fork outcomes without re-execution using k-NN on historical fork data
  • +Delta Compression — structural hashing + diff encoding for efficient trace storage
  • +Probabilistic Detection — Count-Min Sketch, HyperLogLog, Bloom filters in Valkey for real-time anomaly detection
  • +Fork State Checkpointing — progress persisted in Valkey, resumes from last completed span on retry
  • +28 Prometheus metrics + 16 alerts + 20-panel Grafana dashboard
  • +GitHub App integration — eval gate results posted on PRs, connect from Settings
  • +CLI v1.1.0 — retrace sessions, retrace guardrails commands
  • +SDKs v0.3.1 — session_id, agent_id, halt handler
0.6.0

Git Branching for AI Agent Execution

  • +Full Cascade Replay — fork at any span, SDK re-executes entire agent with modified input (not just one call)
  • +SDK Resume Protocol — dedicated listener thread receives 'resume' commands via WebSocket in real-time
  • +Ingestion Queue — BullMQ buffered writes with backpressure, 3 retries, exponential backoff
  • +Event Bus Architecture — trace.started, span.created, trace.ended events decouple ingestion from processing
  • +AI Gateway — single entry point for all AI calls with usage enforcement and cost tracking
  • +PII Redaction Layer — auto-detects emails, phones, SSNs, credit cards, API keys, JWTs in span data
  • +Audit Logging — typed actions (trace.created, fork.replayed, api_key.revoked) with IP tracking
  • +Rule Engine — extracted eval rule processing into standalone service with deduplication
  • +Trace Replay Player — play/pause, 0.5x-4x speed controls, auto-advance through spans like a video
  • +Feedback Export — GET /api/v1/feedback/export for RLHF training data (input/output/score)
  • +Hallucination Scoring — real vocabulary grounding check replaces placeholder
  • +Per-Route Rate Limiting — factory function for AI (20/min), search (50/min), export (5/min)
  • +SDK Offline Buffer — stores up to 1000 messages when disconnected, flushes on reconnect
  • +HTTP Retry — 3 attempts with exponential backoff on fallback transport
  • +Composite Indexes — (traceId, startedAt) and (evaluationId, traceId) for query performance
  • +Eval Rules UI — full CRUD page with threshold, webhook, email notification configuration
  • +Governance — toxic output detection, real hallucination rate calculation
  • +GitHub Actions upgraded to upload-artifact@v7 / download-artifact@v8 (Node 24 compatible)
0.5.0

AI Quality Intelligence, Self-Hosted Infra & SDK Sampling

  • +Prompt Versioning — registry with version history, rollback, and diff (POST/GET /api/v1/prompts)
  • +Evaluation Datasets — curate golden traces into benchmark suites for regression testing
  • +Drift Detection — statistical comparison of recent vs baseline metrics (cost, tokens, duration, error rate)
  • +Failure Clustering — automatic grouping of failed traces by normalized error patterns
  • +Human Feedback Loop — thumbs up/down on traces for training data curation
  • +Trace Sampling — configurable sample_rate (0.0-1.0) in both Python and TypeScript SDKs
  • +Eval Rules Automation UI — create/manage threshold alerts with webhook and email notifications
  • +Eval Rules Executor — filter by project/model, threshold comparison, webhook POST, email alerts
  • +Plan-aware eval judging — enterprise gets gemini-2.5-pro, pro/teams gets gemini-2.5-flash
  • +Billing usage dashboard — progress bars showing traces, tapes, forks, AI requests with limits
  • +Featured tapes section on marketing homepage
  • +Distributed event pipeline — Kafka producer/consumer for async span ingestion
  • +Idempotent processing — duplicate spans from SDK retries safely ignored
0.3.0

Multi-Provider Auto-Instrumentation & Team Collaboration

  • +OpenAI auto-instrumentation — GPT-5.5, GPT-5.4, GPT-5, GPT-4.1 series captured automatically
  • +Anthropic auto-instrumentation — Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 captured automatically
  • +Gemini auto-instrumentation — Gemini 3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash captured automatically
  • +Public tape gallery — explore trending, newest, and featured tapes at /explore
  • +Team collaboration — comment on spans, assign traces to teammates, notifications
  • +Continuous evaluations — auto-run evals on matching traces with webhook alerts
  • +Resumable forks — mark traces as resumable for full agent re-execution from any checkpoint
  • +Free tier expanded from 500 to 5,000 traces/month
  • +MIT license — core platform is now open source
  • +Security hardening — 35 audit findings resolved across all severity levels
0.2.0

Fork Engine, OpenTelemetry & Evaluations

  • +Fork execution engine — replay LLM calls with modified inputs and side-by-side diff
  • +OpenTelemetry OTLP/JSON ingestion endpoint for framework-agnostic tracing
  • +LLM-as-judge evaluation system with custom criteria and batch runs
  • +Real-time WebSocket streaming for live trace updates in the dashboard
  • +Semantic search across all traces and spans with vector similarity
  • +Migrated to Dodo Payments for subscription billing
  • +Trace deletion with full cascade cleanup
  • +Fork comparison UI with divergence scoring
0.1.0

Public Beta Launch

  • +Core trace recording with Python and TypeScript SDKs
  • +Interactive tape player for step-by-step execution replay
  • +Fork from any span and rerun with modified inputs
  • +Shareable tape links with OG images — no login required to view
  • +Persistent agent memory with semantic search and auto-extraction
  • +Skills system for reusable, composable agent capabilities
  • +MCP server integration for Claude Code and Cursor
  • +WebSocket streaming for real-time span ingestion from SDKs