GitHub Action Gate
Fail CI on regression-replay divergence, schema violations, or new high-severity detections.
GitHub Action Gate
The @retrace/action gates your CI on agent quality. It runs in two modes.
Detection gate (mode: detections)
Fail a build on regression-replay divergence (2E), schema violations (2C), or new high-severity detections — with a link back to the failing run's shared trace.
- name: Retrace Detection Gate
uses: yash1511-bogam/retrace/packages/action@v0
with:
api-key: ${{ secrets.RETRACE_API_KEY }}
mode: detections
fail-on: regression,schema_violation,high_severity # any subset
trace-id: latest # for schema/high-severity checks
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for the PR commentWhat each signal checks:
- regression — replays your golden set (
POST /golden-set/regression-replay) and fails if any golden trace structurally regressed. - schema_violation — fails if the trace has any
schema_violationdetection. - high_severity — fails if the trace has any
high/criticaldetection.
On failure the action posts a PR comment with the offending signals and an unlisted link to the failing run.
Eval gate (mode: eval, default)
- uses: yash1511-bogam/retrace/packages/action@v0
with:
api-key: ${{ secrets.RETRACE_API_KEY }}
evaluation-id: ${{ secrets.RETRACE_EVAL_ID }}
trace-id: latest
threshold: "0.8"Fails the build if the evaluation score falls below threshold.
Outputs
| Output | Mode | Description |
|---|---|---|
status | both | pass or fail |
score | eval | overall evaluation score |
run-id | eval | eval run ID |
reasons | detections | why the gate failed |
Golden-cassette regression gate (ci replay)
A second, offline gate complements the eval/detection gate above: re-run your agent against a committed golden cassette and fail the PR if the behavior diverges. It makes no network calls and needs no API key — it's a pure file comparison, so the verdict is deterministic.
1. Record a golden cassette
Run your agent once and write the cassette to a file you commit:
import retrace
@retrace.record(name="my-agent")
def run(task): ...
run("summarize Q3 report")
retrace.write_golden_cassette("cassettes/summarize.golden.json")import { trace, writeGoldenCassette } from "retrace-sdk";
const run = trace(async (task: string) => { /* ... */ }, { name: "my-agent" });
await run("summarize Q3 report");
writeGoldenCassette("cassettes/summarize.golden.json");2. Gate in CI
In CI, re-run the agent (the SDK replays recorded model calls, so it's deterministic), write a candidate cassette, then diff it against the golden with the CLI or the action:
retrace ci replay --golden cassettes/summarize.golden.json --candidate out/summarize.candidate.json- uses: yash1511-bogam/retrace/.github/actions/ci-replay@<sha>
with:
golden: cassettes/summarize.golden.json
candidate: out/summarize.candidate.json
cli-version: "1.5.3" # pin — never float to latest in CIExit codes (stable — they're an API): 0 match · 1 behavioral divergence · 2 usage/IO error.
Divergence budgets & flaky steps
Exact match is brittle, so the golden cassette can carry a tolerance block:
{ "tolerance": { "default": "exact", "steps": { "fetch-time": "ignore", "summary": "semantic" } } }exact(default) — value-identical.ignore— never gate this step (a known-nondeterministic tool).semantic/judge— equivalence by embedding similarity or an LLM judge (evaluated server-side; the offline CLI reports these steps rather than failing on them).
Pass two or more --candidate cassettes (repeat the run N times) and any step whose output varies
across them with zero substitutions is auto-marked nondeterministic and excluded from gating —
reported, never silently hidden.