Enforcement
Circuit breakers that block or hold an agent action before it runs — not just flag it after.
Enforcement (Circuit Breakers)
Guardrails monitor a run and HALT it after a threshold is crossed. Enforcement is the active counterpart: a policy can block or hold an agent action before it executes — the difference between catching a runaway tool loop on span 3 and discovering it on the invoice after it ran 2,000 times.
Enforcement has two layers, and you'll usually use both:
- Local limits (SDK, offline). Hard ceilings on steps / tokens / USD per run, enforced in-process with zero network. They trip even when the API is unreachable.
- Server policies (centrally managed). Budget ceilings, loop breakers, debounce windows, and action policies stored on the server and evaluated at the SDK pre-call gate and at ingest. Authoritative at ingest.
Policy types
| Type | What it does | Key config |
|---|---|---|
budget_ceiling | Block/hold when tokens, USD, or steps exceed a cap (per run and/or per day) | max_tokens_per_run, max_usd_per_run, max_steps_per_run (+ _per_day), on_trip |
loop_breaker | Trip after the same tool + args repeats N times in a run | max_repeats (2–1000), on_trip |
debounce | Trip on a repeated tool + args call inside a time window | window_seconds (1–86400), on_trip |
action_policy | Match a tool-name pattern → a verdict | tool_pattern (e.g. shell*), verdict |
composite | Combine ordered typed rules (tool pattern + per-run ceilings) with AND/OR → one verdict | combinator (and/or), rules[], on_trip |
on_trip and verdict are one of allow, block, or hold. Hold-for-approval (hold) requires the Pro plan or higher; plain block is available on every plan.
Verdicts
- allow — the action proceeds.
- block — the action is refused; the SDK raises a typed error.
- hold — the action waits for a human decision in the approval queue, or applies the fail-closed timeout verdict (default: deny) after
RETRACE_ENFORCEMENT_HOLD_TIMEOUT_SECONDS.
Create a policy (API)
curl -X POST https://api.retraceai.tech/api/v1/enforcement/policies \
-H "x-retrace-key: rt_live_..." \
-H "Content-Type: application/json" \
-d '{
"name": "Per-run cost cap",
"type": "budget_ceiling",
"config": { "max_usd_per_run": 5, "on_trip": "block" }
}'The pre-call gate the SDKs use:
curl -X POST https://api.retraceai.tech/api/v1/enforcement/check \
-H "x-retrace-key: rt_live_..." \
-H "Content-Type: application/json" \
-d '{ "run_id": "run-123", "proposed_usd": 0.40, "tool_name": "web_search", "tool_args_hash": "a1b2c3" }'A block returns 200 with { "verdict": "block", "reason": ... }; a hold returns 202 with a hold_id.
SDK local limits (Python)
import retrace
from retrace.errors import RetraceEnforcementError
retrace.configure(
api_key="rt_live_...",
max_steps_per_run=50,
max_usd_per_run=2.0,
server_enforcement=True, # also consult centrally-managed server policies
)
@retrace.record(name="agent")
def run(prompt): ...
try:
run("...")
except RetraceEnforcementError as e:
print(e.verdict, e.reason) # "block", "Local USD ceiling reached: $2.01 > $2.0 per run."SDK local limits (TypeScript)
import { configure, trace, RetraceEnforcementError } from "retrace-sdk";
configure({
apiKey: "rt_live_...",
maxStepsPerRun: 50,
maxUsdPerRun: 2.0,
serverEnforcement: true,
});
try {
await runAgent("...");
} catch (e) {
if (e instanceof RetraceEnforcementError) console.log(e.verdict, e.reason);
}Every setting also reads from an env var (RETRACE_MAX_STEPS_PER_RUN, RETRACE_MAX_TOKENS_PER_RUN, RETRACE_MAX_USD_PER_RUN, RETRACE_SERVER_ENFORCEMENT). Precedence is explicit code arg > env var > unset.
Failure semantics
| Condition | Behavior |
|---|---|
| Local ceiling exceeded | Raises RetraceEnforcementError synchronously — the run stops before the next call. Never silent. |
Server /check unreachable | Local limits still apply; the failure is logged. Server policies are best-effort from the SDK and authoritative at ingest. |
| Counter store (Valkey) down, server-side | Fail-CLOSED — an active counter-based policy blocks rather than letting unbounded spend through (same safety class as billing). |
Policy is in simulate mode | Records a simulated event (would-have-blocked) but never changes the live verdict — tune ceilings on real traffic before arming. |
Blocked and held actions are also recorded as detections (detector enforcement), so they appear in the trace detections panel and the Enforcement → Events feed.
Plans
enforcement_policy is a structural cap (Free: 2 active policies; Pro and above: unlimited). Hold-for-approval action policies require the enforcement_hold feature (Pro and above). A creation blocked by your plan returns a 403 distinct from a runtime policy block.
Enforcement Gateway (any language / framework)
The SDK gate is advisory; the gateway is in the request path, so enforcement is guaranteed for any client. Point your OpenAI/Anthropic SDK at the gateway base URL and authenticate with your Retrace key — your provider key is passed per-request and never stored or logged.
# OpenAI-compatible
curl -X POST https://api.retraceai.tech/gateway/v1/chat/completions \
-H "x-retrace-key: rt_live_..." \
-H "Authorization: Bearer sk-...your-openai-key..." \
-H "x-retrace-run-id: run-123" \
-H "Content-Type: application/json" \
-d '{ "model": "gpt-5.5", "messages": [{"role":"user","content":"hi"}] }'- Policies are evaluated before the request is forwarded; a block returns
403(fail-closed). - Streaming responses pass through unbuffered.
- Every call is captured as a normal trace (metered + PII-redacted).
- Anthropic:
POST /gateway/v1/messageswithx-api-key: sk-ant-....
[!NOTE] Streaming + token enforcement. Streaming responses are forwarded unbuffered, so the gateway does not read the output stream and therefore cannot enforce
max_usd_per_run/ token ceilings on the output of a streaming call after the fact. For streaming calls, enforcement is pre-call only — setmax_tokens(OpenAI) /max_tokens(Anthropic) on the request and the gateway enforces against that proposed budget before forwarding. Non-streaming calls are metered on their real usage.
[!NOTE] Quota accounting. The gateway reserves your
trace_createdquota after the provider responds (so a failed provider call isn't charged). A gateway-only workload can therefore exceed its monthlytrace_createdlimit by at most one call before the meter catches up — you've already paid the provider for that call directly, so no usage is lost.