Python SDK
Complete reference for the Retrace Python SDK.
Installation
pip install retrace-sdkRequires Python 3.10+. The package has minimal dependencies: websocket-client for streaming and requests for HTTP transport.
Configuration
Configure the SDK at application startup:
import retrace
retrace.configure(
api_key="rt_live_...",
base_url="https://api.retraceai.tech",
project_id="my-project",
)Alternatively, set environment variables and call retrace.configure() with no arguments:
| Variable | Default | Description |
|---|---|---|
RETRACE_API_KEY | — | Your API key (required) |
RETRACE_BASE_URL | https://api.retraceai.tech | API endpoint |
RETRACE_PROJECT_ID | — | Default project identifier |
RETRACE_ENABLED | true | Set to false to disable all tracing |
@record Decorator
The simplest way to trace a function. Input arguments and return values are captured automatically:
@retrace.record(name="research-agent")
def research(topic: str) -> str:
response = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": f"Research {topic}"}]
)
return response.choices[0].message.contentIf the function raises an exception, the trace is marked as failed and the error is recorded.
Context Manager
For more control over trace lifecycle, use the context manager form:
with retrace.record(name="agent", input={"prompt": "hello"}) as t:
result = agent.run("hello")
t.output = resultYou can attach metadata, set custom status, or add tags within the block:
with retrace.record(name="pipeline") as t:
t.metadata = {"environment": "staging", "version": "1.2.0"}
result = run_pipeline()
t.output = resultManual Spans
Create child spans for granular visibility into individual steps:
with retrace.record(name="agent") as t:
span = t._recorder.start_span(
name="web_search",
span_type=retrace.SpanType.TOOL_CALL,
input={"query": "quantum computing"}
)
results = search("quantum computing")
t._recorder.end_span(span.id, output=results)Supported span types: LLM_CALL, TOOL_CALL, TOOL_RESULT, REASONING, ACTION, ERROR, FORK_POINT.
Auto-Instrumentation
Retrace automatically captures LLM calls from all major providers when the SDK is installed alongside them.
OpenAI
from retrace.interceptors.openai import install_openai_interceptor
install_openai_interceptor()Captures all openai.chat.completions.create() calls including model, messages, tokens, cost, and latency.
Anthropic
from retrace.interceptors.anthropic import install_anthropic_interceptor
install_anthropic_interceptor()Captures all anthropic.messages.create() calls.
Gemini
Automatically captures all Gemini API calls as spans:
from retrace.interceptors.gemini import install_gemini_interceptor
install_gemini_interceptor()Captured data includes model, messages, token usage, latency, and cost.
Transport Modes
| Mode | Protocol | Use Case |
|---|---|---|
auto | WebSocket with HTTP fallback | Default, recommended for most environments |
ws | WebSocket only | Real-time streaming, long-running agents |
http | HTTP batch | Serverless (Lambda, Cloud Functions) |
Configure transport explicitly:
retrace.configure(api_key="rt_live_...", transport="http")Disabling in Tests
Prevent trace emission during test runs:
import os
os.environ["RETRACE_ENABLED"] = "false"[!TIP] Async is supported:
@recordtracesasync deffunctions, and the OpenAI/Anthropic/Gemini interceptors capture async (and async-streaming) calls. The transport layer itself is synchronous (websocket-client/requests); on async frameworks usetransport="http"so emission never blocks the event loop.
Resumable Traces
Mark a trace as resumable to enable fork-and-replay from any span:
@retrace.record(name="my-agent", resumable=True)
def run_agent(prompt, context):
# Each LLM call becomes a checkpoint
plan = call_planner(prompt)
result = call_executor(plan)
return resultWhen a fork is triggered from a specific span, the agent can be re-executed from that point with modified inputs.
Sampling
Control what percentage of traces are recorded to reduce costs at high volume:
retrace.configure(
api_key="rt_live_...",
sample_rate=0.5, # Record 50% of traces
)Or via environment variable:
RETRACE_SAMPLE_RATE=0.1 # Record 10% of tracesSampled-out traces execute normally with zero SDK overhead. All failures are always captured regardless of sample rate.
Deterministic Sampling
For reproducible sampling decisions, provide a seed:
retrace.configure(
api_key="rt_live_...",
sample_rate=0.5,
sample_seed="my-stable-seed", # Same seed + function name = same decision
)Or via environment variable:
RETRACE_SAMPLE_SEED=my-stable-seedW3C Traceparent Propagation
Inject distributed tracing headers into outgoing HTTP requests:
from retrace.traceparent import set_trace_context, inject_traceparent, parse_traceparent
# Set active context inside a traced function
set_trace_context(trace_id, span_id)
# Inject into outgoing requests
headers = inject_traceparent({"Content-Type": "application/json"})
# headers now includes: {"traceparent": "00-{trace_id}-{span_id}-01", ...}
requests.get("https://downstream.com/api", headers=headers)
# Parse incoming traceparent
result = parse_traceparent(request.headers["traceparent"])
# (trace_id, parent_id, sampled)Streaming Interception
The SDK automatically captures streaming responses from OpenAI and Anthropic:
# Streaming is intercepted transparently
stream = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Hello"}],
stream=True, # ← SDK wraps the generator, emits span on completion
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
# Span emitted here with full output, tokens, and costWorks identically for Anthropic:
with client.messages.stream(model="claude-sonnet-4", ...) as stream:
for text in stream.text_stream:
print(text, end="")
# Span captured automaticallyToken ID Capture
The Python SDK supports storing token IDs and log-probabilities for speculative decoding during replay:
@retrace.record(name="my-agent")
def run(prompt: str):
response = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": prompt}],
logprobs=True, # Enables token ID capture
)
return response.choices[0].message.contentWhen logprobs=True is set, the interceptor extracts token IDs from the response and includes them in the span data. These are used during fork replay to achieve near-instant verification of unchanged outputs.