GitHub Action Gate

The @retrace/action gates your CI on agent quality. It runs in two modes.

Detection gate (mode: `detections`)

Fail a build on regression-replay divergence (2E), schema violations (2C), or new high-severity detections — with a link back to the failing run's shared trace.

- name: Retrace Detection Gate
  uses: yash1511-bogam/retrace/packages/action@v0
  with:
    api-key: ${{ secrets.RETRACE_API_KEY }}
    mode: detections
    fail-on: regression,schema_violation,high_severity   # any subset
    trace-id: latest                                     # for schema/high-severity checks
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}            # for the PR comment

What each signal checks:

regression — replays your golden set (POST /golden-set/regression-replay) and fails if any golden trace structurally regressed.
schema_violation — fails if the trace has any schema_violation detection.
high_severity — fails if the trace has any high/critical detection.

On failure the action posts a PR comment with the offending signals and an unlisted link to the failing run.

Eval gate (mode: `eval`, default)

- uses: yash1511-bogam/retrace/packages/action@v0
  with:
    api-key: ${{ secrets.RETRACE_API_KEY }}
    evaluation-id: ${{ secrets.RETRACE_EVAL_ID }}
    trace-id: latest
    threshold: "0.8"

Fails the build if the evaluation score falls below threshold.

Outputs

Output	Mode	Description
`status`	both	`pass` or `fail`
`score`	eval	overall evaluation score
`run-id`	eval	eval run ID
`reasons`	detections	why the gate failed

Golden-cassette regression gate (`ci replay`)

A second, offline gate complements the eval/detection gate above: re-run your agent against a committed golden cassette and fail the PR if the behavior diverges. It makes no network calls and needs no API key — it's a pure file comparison, so the verdict is deterministic.

1. Record a golden cassette

Run your agent once and write the cassette to a file you commit:

import retrace

@retrace.record(name="my-agent")
def run(task): ...

run("summarize Q3 report")
retrace.write_golden_cassette("cassettes/summarize.golden.json")

import { trace, writeGoldenCassette } from "retrace-sdk";

const run = trace(async (task: string) => { /* ... */ }, { name: "my-agent" });
await run("summarize Q3 report");
writeGoldenCassette("cassettes/summarize.golden.json");

2. Gate in CI

In CI, re-run the agent (the SDK replays recorded model calls, so it's deterministic), write a candidate cassette, then diff it against the golden with the CLI or the action:

retrace ci replay --golden cassettes/summarize.golden.json --candidate out/summarize.candidate.json

- uses: yash1511-bogam/retrace/.github/actions/ci-replay@<sha>
  with:
    golden: cassettes/summarize.golden.json
    candidate: out/summarize.candidate.json
    cli-version: "1.5.3"   # pin — never float to latest in CI

Exit codes (stable — they're an API): 0 match · 1 behavioral divergence · 2 usage/IO error.

Divergence budgets & flaky steps

Exact match is brittle, so the golden cassette can carry a tolerance block:

{ "tolerance": { "default": "exact", "steps": { "fetch-time": "ignore", "summary": "semantic" } } }

exact (default) — value-identical.
ignore — never gate this step (a known-nondeterministic tool).
semantic / judge — equivalence by embedding similarity or an LLM judge (evaluated server-side; the offline CLI reports these steps rather than failing on them).

Pass two or more --candidate cassettes (repeat the run N times) and any step whose output varies across them with zero substitutions is auto-marked nondeterministic and excluded from gating — reported, never silently hidden.

GitHub Action Gate