Eval Gates
CI/CD quality gates that block deploys when agent quality drops.
Eval Gates
Eval gates integrate into your CI/CD pipeline to block deploys when agent quality regresses. They compare evaluation scores against baselines and return pass/fail.
Running an Eval Gate
curl -X POST https://api.retraceai.tech/api/v1/evaluations/:id/gate \
-H "x-retrace-key: rt_live_..." \
-H "Content-Type: application/json" \
-d '{
"trace_id": "abc-123",
"threshold": 0.8,
"baseline_run_ids": ["run-1", "run-2"]
}'Response:
{
"pass": false,
"score": 0.72,
"threshold": 0.8,
"regression_detected": true,
"baseline_avg": 0.85,
"delta": -0.13
}GitHub Actions Integration
- name: Eval Gate
uses: yash1511-bogam/retrace/packages/action@main
with:
api-key: ${{ secrets.RETRACE_API_KEY }}
evaluation-id: ${{ env.EVAL_ID }}
trace-id: latest
threshold: "0.8"
api-url: https://api.retraceai.tech
post-pr-comment: "true"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}When post-pr-comment is "true", the action:
- Runs the evaluation against the specified trace
- Auto-publishes the trace as an unlisted tape
- Posts a PR comment with scores table + a "View full trace replay" link
Example PR comment:
🔍 Retrace Eval Gate
Score: 87.3% ✅ (threshold: 80.0%)
Criterion Score Accuracy 92.1% Relevance 83.4%
The CLI equivalent exits with code 1 on failure — blocking the deploy:
retrace eval gate --evaluation $EVAL_ID --trace $TRACE_ID --threshold 0.8GitHub App
Install the retrace-eval-gate GitHub App to automatically post eval results as PR checks. Connect from Settings → Integrations in the dashboard.
CLI
retrace eval run --evaluation <id> --traces <id1>,<id2>
retrace eval gate --evaluation <id> --trace <id> --threshold 0.8Regression Detection
When baseline_run_ids are provided, the gate compares the current score against the average of baseline runs. If the score drops by more than 10% relative to the baseline, regression_detected is set to true.