Installation

pip install retrace-sdk

Requires Python 3.10+. The package has minimal dependencies: websocket-client for streaming and requests for HTTP transport.

Configuration

Configure the SDK at application startup:

import retrace

retrace.configure(
    api_key="rt_live_...",
    base_url="https://api.retraceai.tech",
    project_id="my-project",
)

Alternatively, set environment variables and call retrace.configure() with no arguments:

Variable	Default	Description
`RETRACE_API_KEY`	—	Your API key (required)
`RETRACE_BASE_URL`	`https://api.retraceai.tech`	API endpoint
`RETRACE_PROJECT_ID`	—	Default project identifier
`RETRACE_ENABLED`	`true`	Set to `false` to disable all tracing

@record Decorator

The simplest way to trace a function. Input arguments and return values are captured automatically:

@retrace.record(name="research-agent")
def research(topic: str) -> str:
    response = client.chat.completions.create(
        model="gpt-5.5",
        messages=[{"role": "user", "content": f"Research {topic}"}]
    )
    return response.choices[0].message.content

If the function raises an exception, the trace is marked as failed and the error is recorded.

Context Manager

For more control over trace lifecycle, use the context manager form:

with retrace.record(name="agent", input={"prompt": "hello"}) as t:
    result = agent.run("hello")
    t.output = result

You can attach metadata, set custom status, or add tags within the block:

with retrace.record(name="pipeline") as t:
    t.metadata = {"environment": "staging", "version": "1.2.0"}
    result = run_pipeline()
    t.output = result

Manual Spans

Create child spans for granular visibility into individual steps:

with retrace.record(name="agent") as t:
    span = t._recorder.start_span(
        name="web_search",
        span_type=retrace.SpanType.TOOL_CALL,
        input={"query": "quantum computing"}
    )
    results = search("quantum computing")
    t._recorder.end_span(span.id, output=results)

Supported span types: LLM_CALL, TOOL_CALL, TOOL_RESULT, REASONING, ACTION, ERROR, FORK_POINT.

Auto-Instrumentation

Retrace automatically captures LLM calls from all major providers when the SDK is installed alongside them.

OpenAI

from retrace.interceptors.openai import install_openai_interceptor
install_openai_interceptor()

Captures all openai.chat.completions.create() calls including model, messages, tokens, cost, and latency.

Anthropic

from retrace.interceptors.anthropic import install_anthropic_interceptor
install_anthropic_interceptor()

Captures all anthropic.messages.create() calls.

Gemini

Automatically captures all Gemini API calls as spans:

from retrace.interceptors.gemini import install_gemini_interceptor
install_gemini_interceptor()

Captured data includes model, messages, token usage, latency, and cost.

Transport Modes

Mode	Protocol	Use Case
`auto`	WebSocket with HTTP fallback	Default, recommended for most environments
`ws`	WebSocket only	Real-time streaming, long-running agents
`http`	HTTP batch	Serverless (Lambda, Cloud Functions)

Configure transport explicitly:

retrace.configure(api_key="rt_live_...", transport="http")

Disabling in Tests

Prevent trace emission during test runs:

import os
os.environ["RETRACE_ENABLED"] = "false"

[!TIP] Async is supported: @record traces async def functions, and the OpenAI/Anthropic/Gemini interceptors capture async (and async-streaming) calls. The transport layer itself is synchronous (websocket-client / requests); on async frameworks use transport="http" so emission never blocks the event loop.

Resumable Traces

Mark a trace as resumable to enable fork-and-replay from any span:

@retrace.record(name="my-agent", resumable=True)
def run_agent(prompt, context):
    # Each LLM call becomes a checkpoint
    plan = call_planner(prompt)
    result = call_executor(plan)
    return result

When a fork is triggered from a specific span, the agent can be re-executed from that point with modified inputs.

Sampling

Control what percentage of traces are recorded to reduce costs at high volume:

retrace.configure(
    api_key="rt_live_...",
    sample_rate=0.5,  # Record 50% of traces
)

Or via environment variable:

RETRACE_SAMPLE_RATE=0.1  # Record 10% of traces

Sampled-out traces execute normally with zero SDK overhead. All failures are always captured regardless of sample rate.

Deterministic Sampling

For reproducible sampling decisions, provide a seed:

retrace.configure(
    api_key="rt_live_...",
    sample_rate=0.5,
    sample_seed="my-stable-seed",  # Same seed + function name = same decision
)

Or via environment variable:

RETRACE_SAMPLE_SEED=my-stable-seed

W3C Traceparent Propagation

Inject distributed tracing headers into outgoing HTTP requests:

from retrace.traceparent import set_trace_context, inject_traceparent, parse_traceparent

# Set active context inside a traced function
set_trace_context(trace_id, span_id)

# Inject into outgoing requests
headers = inject_traceparent({"Content-Type": "application/json"})
# headers now includes: {"traceparent": "00-{trace_id}-{span_id}-01", ...}

requests.get("https://downstream.com/api", headers=headers)

# Parse incoming traceparent
result = parse_traceparent(request.headers["traceparent"])
# (trace_id, parent_id, sampled)

Streaming Interception

The SDK automatically captures streaming responses from OpenAI and Anthropic:

# Streaming is intercepted transparently
stream = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,  # ← SDK wraps the generator, emits span on completion
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
# Span emitted here with full output, tokens, and cost

Works identically for Anthropic:

with client.messages.stream(model="claude-sonnet-4", ...) as stream:
    for text in stream.text_stream:
        print(text, end="")
# Span captured automatically

Token ID Capture

The Python SDK supports storing token IDs and log-probabilities for speculative decoding during replay:

@retrace.record(name="my-agent")
def run(prompt: str):
    response = client.chat.completions.create(
        model="gpt-5.5",
        messages=[{"role": "user", "content": prompt}],
        logprobs=True,  # Enables token ID capture
    )
    return response.choices[0].message.content

When logprobs=True is set, the interceptor extracts token IDs from the response and includes them in the span data. These are used during fork replay to achieve near-instant verification of unchanged outputs.

Python SDK

Installation

Configuration

@record Decorator

Context Manager

Manual Spans

Auto-Instrumentation

OpenAI

Anthropic

Gemini

Transport Modes

Disabling in Tests

Resumable Traces

Sampling

Deterministic Sampling

W3C Traceparent Propagation

Streaming Interception

Token ID Capture

On this page

Python SDK

Installation

Configuration

@record Decorator

Context Manager

Manual Spans

Auto-Instrumentation

OpenAI

Anthropic

Gemini

Transport Modes

Disabling in Tests

Resumable Traces

Sampling

Deterministic Sampling

W3C Traceparent Propagation

Streaming Interception

Token ID Capture

On this page