Fork & Replay
Branch from any decision point and cascade-replay the entire agent with modified input.
Fork & Replay
Fork from any span in a trace, modify the input, and watch the entire agent re-execute from that point forward. Context from the fork flows into subsequent LLM calls automatically.
Creating a Fork
curl -X POST https://api.retraceai.tech/api/v1/forks \
-H "x-retrace-key: rt_live_..." \
-H "Content-Type: application/json" \
-d '{
"trace_id": "abc-123",
"fork_point_span_id": "span-456",
"modified_input": "What about using a different approach?"
}'Replaying a Fork
curl -X POST https://api.retraceai.tech/api/v1/forks/:id/replay \
-H "x-retrace-key: rt_live_..."The SDK re-executes the full function from the fork point. Every LLM call after the fork receives context about the modified path.
Viewing the Diff
curl https://api.retraceai.tech/api/v1/forks/:id/diff \
-H "x-retrace-key: rt_live_..."Returns a side-by-side comparison: original vs forked execution with cost, latency, and token deltas.
Fork Sweeps (Batch)
Test multiple input variants simultaneously:
curl -X POST https://api.retraceai.tech/api/v1/forks/sweep \
-H "x-retrace-key: rt_live_..." \
-H "Content-Type: application/json" \
-d '{
"trace_id": "abc-123",
"fork_point_span_id": "span-456",
"variants": [
{ "label": "concise", "modified_input": "Be brief." },
{ "label": "detailed", "modified_input": "Explain in detail." }
]
}'Sweeps run with concurrency 3 and checkpoint progress in Valkey for resumability.
Sensitivity Analysis
Score how sensitive an agent's output is to input perturbations:
curl -X POST https://api.retraceai.tech/api/v1/forks/sensitivity \
-H "x-retrace-key: rt_live_..." \
-H "Content-Type: application/json" \
-d '{ "trace_id": "abc-123", "fork_point_span_id": "span-456" }'CLI
retrace forks create --trace <id> --span <id> --input "new prompt"
retrace forks replay <id> --wait
retrace forks diff <id> --jsonNon-Determinism in Replay
Fork replay re-executes your actual agent function with modified input. Because LLMs are inherently non-deterministic (temperature, sampling, model updates), the forked path may produce different outputs even with identical input. This is by design — Retrace shows you what would happen, not what must happen.
To increase reproducibility:
- Set
temperature=0in your LLM calls - Use a fixed
seedparameter if your provider supports it - Pin a specific model version (e.g.,
gemini-2.5-flash-001instead ofgemini-2.5-flash)
The divergence score in the diff view accounts for this — it measures semantic difference between outputs, not exact string matching.
Cost Estimation
Before executing a fork, estimate the cost:
curl https://api.retraceai.tech/api/v1/forks/:id/estimate \
-H "x-retrace-key: rt_live_..."Returns a breakdown showing naive cost vs optimized cost with savings from semantic caching, context compression, and differential replay. See Replay Optimizations for details.
Inline Editing
Edit a span's input directly and create a fork in one step:
curl -X POST https://api.retraceai.tech/api/v1/traces/:id/inline-edit \
-H "x-retrace-key: rt_live_..." \
-H "Content-Type: application/json" \
-d '{"span_id": "span-456", "modified_input": "Try a different approach"}'After replay, view the step-by-step diff:
curl https://api.retraceai.tech/api/v1/traces/:id/diff/:forkId \
-H "x-retrace-key: rt_live_..."Commit a successful fork as the new baseline:
curl -X POST https://api.retraceai.tech/api/v1/forks/:id/commit \
-H "x-retrace-key: rt_live_..."