Enforcement (Circuit Breakers)

Guardrails monitor a run and HALT it after a threshold is crossed. Enforcement is the active counterpart: a policy can block or hold an agent action before it executes — the difference between catching a runaway tool loop on span 3 and discovering it on the invoice after it ran 2,000 times.

Enforcement has two layers, and you'll usually use both:

Local limits (SDK, offline). Hard ceilings on steps / tokens / USD per run, enforced in-process with zero network. They trip even when the API is unreachable.
Server policies (centrally managed). Budget ceilings, loop breakers, debounce windows, and action policies stored on the server and evaluated at the SDK pre-call gate and at ingest. Authoritative at ingest.

Policy types

Type	What it does	Key config
`budget_ceiling`	Block/hold when tokens, USD, or steps exceed a cap (per run and/or per day)	`max_tokens_per_run`, `max_usd_per_run`, `max_steps_per_run` (+ `_per_day`), `on_trip`
`loop_breaker`	Trip after the same tool + args repeats N times in a run	`max_repeats` (2–1000), `on_trip`
`debounce`	Trip on a repeated tool + args call inside a time window	`window_seconds` (1–86400), `on_trip`
`action_policy`	Match a tool-name pattern → a verdict	`tool_pattern` (e.g. `shell*`), `verdict`
`composite`	Combine ordered typed rules (tool pattern + per-run ceilings) with AND/OR → one verdict	`combinator` (`and`/`or`), `rules[]`, `on_trip`

on_trip and verdict are one of allow, block, or hold. Hold-for-approval (hold) requires the Pro plan or higher; plain block is available on every plan.

Verdicts

allow — the action proceeds.
block — the action is refused; the SDK raises a typed error.
hold — the action waits for a human decision in the approval queue, or applies the fail-closed timeout verdict (default: deny) after RETRACE_ENFORCEMENT_HOLD_TIMEOUT_SECONDS.

Create a policy (API)

curl -X POST https://api.retraceai.tech/api/v1/enforcement/policies \
  -H "x-retrace-key: rt_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Per-run cost cap",
    "type": "budget_ceiling",
    "config": { "max_usd_per_run": 5, "on_trip": "block" }
  }'

The pre-call gate the SDKs use:

curl -X POST https://api.retraceai.tech/api/v1/enforcement/check \
  -H "x-retrace-key: rt_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "run_id": "run-123", "proposed_usd": 0.40, "tool_name": "web_search", "tool_args_hash": "a1b2c3" }'

A block returns 200 with { "verdict": "block", "reason": ... }; a hold returns 202 with a hold_id.

SDK local limits (Python)

import retrace
from retrace.errors import RetraceEnforcementError

retrace.configure(
    api_key="rt_live_...",
    max_steps_per_run=50,
    max_usd_per_run=2.0,
    server_enforcement=True,  # also consult centrally-managed server policies
)

@retrace.record(name="agent")
def run(prompt): ...

try:
    run("...")
except RetraceEnforcementError as e:
    print(e.verdict, e.reason)  # "block", "Local USD ceiling reached: $2.01 > $2.0 per run."

SDK local limits (TypeScript)

import { configure, trace, RetraceEnforcementError } from "retrace-sdk";

configure({
  apiKey: "rt_live_...",
  maxStepsPerRun: 50,
  maxUsdPerRun: 2.0,
  serverEnforcement: true,
});

try {
  await runAgent("...");
} catch (e) {
  if (e instanceof RetraceEnforcementError) console.log(e.verdict, e.reason);
}

Every setting also reads from an env var (RETRACE_MAX_STEPS_PER_RUN, RETRACE_MAX_TOKENS_PER_RUN, RETRACE_MAX_USD_PER_RUN, RETRACE_SERVER_ENFORCEMENT). Precedence is explicit code arg > env var > unset.

Failure semantics

Condition	Behavior
Local ceiling exceeded	Raises `RetraceEnforcementError` synchronously — the run stops before the next call. Never silent.
Server `/check` unreachable	Local limits still apply; the failure is logged. Server policies are best-effort from the SDK and authoritative at ingest.
Counter store (Valkey) down, server-side	Fail-CLOSED — an active counter-based policy blocks rather than letting unbounded spend through (same safety class as billing).
Policy is in `simulate` mode	Records a `simulated` event (would-have-blocked) but never changes the live verdict — tune ceilings on real traffic before arming.

Blocked and held actions are also recorded as detections (detector enforcement), so they appear in the trace detections panel and the Enforcement → Events feed.

Plans

enforcement_policy is a structural cap (Free: 2 active policies; Pro and above: unlimited). Hold-for-approval action policies require the enforcement_hold feature (Pro and above). A creation blocked by your plan returns a 403 distinct from a runtime policy block.

Enforcement Gateway (any language / framework)

The SDK gate is advisory; the gateway is in the request path, so enforcement is guaranteed for any client. Point your OpenAI/Anthropic SDK at the gateway base URL and authenticate with your Retrace key — your provider key is passed per-request and never stored or logged.

# OpenAI-compatible
curl -X POST https://api.retraceai.tech/gateway/v1/chat/completions \
  -H "x-retrace-key: rt_live_..." \
  -H "Authorization: Bearer sk-...your-openai-key..." \
  -H "x-retrace-run-id: run-123" \
  -H "Content-Type: application/json" \
  -d '{ "model": "gpt-5.5", "messages": [{"role":"user","content":"hi"}] }'

Policies are evaluated before the request is forwarded; a block returns 403 (fail-closed).
Streaming responses pass through unbuffered.
Every call is captured as a normal trace (metered + PII-redacted).
Anthropic: POST /gateway/v1/messages with x-api-key: sk-ant-....

[!NOTE] Streaming + token enforcement. Streaming responses are forwarded unbuffered, so the gateway does not read the output stream and therefore cannot enforce max_usd_per_run / token ceilings on the output of a streaming call after the fact. For streaming calls, enforcement is pre-call only — set max_tokens (OpenAI) / max_tokens (Anthropic) on the request and the gateway enforces against that proposed budget before forwarding. Non-streaming calls are metered on their real usage.

[!NOTE] Quota accounting. The gateway reserves your trace_created quota after the provider responds (so a failed provider call isn't charged). A gateway-only workload can therefore exceed its monthly trace_created limit by at most one call before the meter catches up — you've already paid the provider for that call directly, so no usage is lost.

Enforcement

Enforcement (Circuit Breakers)

Policy types

Verdicts

Create a policy (API)

SDK local limits (Python)

SDK local limits (TypeScript)

Failure semantics

Plans

Enforcement Gateway (any language / framework)

On this page

Enforcement

Enforcement (Circuit Breakers)

Policy types

Verdicts

Create a policy (API)

SDK local limits (Python)

SDK local limits (TypeScript)

Failure semantics

Plans

Enforcement Gateway (any language / framework)

On this page