Failure Taxonomy (MAST)
The 14 MAST failure modes Retrace auto-detects on failed traces, grouped into 3 categories.
Failure Taxonomy (MAST)
When a trace fails, Retrace classifies why using the MAST taxonomy (Multi-Agent System failure taxonomy) — 14 failure modes across 3 categories. Classification runs automatically on failed traces (an LLM judge, metered separately from your interactive AI quota) and the result appears as a detection and in the Insights → Failure Taxonomy breakdown.
1. Specification & System Design
The agent violates the task or role it was given, or the system is poorly specified.
| Mode | Name | What it means |
|---|---|---|
FM-1.1 | Disobey Task Specification | The agent ignores or violates the task's stated constraints, format, or requirements. |
FM-1.2 | Disobey Role Specification | The agent acts outside its assigned role, taking on work it was not meant to do. |
FM-1.3 | Step Repetition | The agent needlessly repeats steps it already completed, stalling progress. |
FM-1.4 | Loss of Conversation History | Earlier context is dropped, so the agent forgets prior constraints or decisions. |
FM-1.5 | Unaware of Termination Conditions | The agent does not recognize when the task is complete and should stop. |
2. Inter-Agent Misalignment
Agents miscommunicate or fail to coordinate, so collective progress breaks down.
| Mode | Name | What it means |
|---|---|---|
FM-2.1 | Conversation Reset | The dialogue unexpectedly restarts, discarding accumulated progress. |
FM-2.2 | Fail to Ask for Clarification | The agent proceeds on ambiguous input instead of asking for clarification. |
FM-2.3 | Task Derailment | The agent drifts away from the original objective onto an unrelated path. |
FM-2.4 | Information Withholding | An agent fails to share information that other agents need to proceed. |
FM-2.5 | Ignored Other Agent's Input | The agent disregards a relevant contribution from another agent. |
FM-2.6 | Reasoning–Action Mismatch | The agent's stated reasoning or plan does not match the action it actually takes. |
3. Task Verification & Termination
The result is finalized without correct verification, or the run stops at the wrong time.
| Mode | Name | What it means |
|---|---|---|
FM-3.1 | Premature Termination | The run ends before the task is actually finished. |
FM-3.2 | No or Incomplete Verification | The output is not verified, or only partially, letting errors through. |
FM-3.3 | Incorrect Verification | Verification is performed but reaches the wrong conclusion. |
How it works
- Only failed traces are classified (cost is bounded to the failure population).
- Each classification is metered on the
mast_classificationplan key — separate fromai_request, so background tagging never eats your interactive AI quota. - One classification per trace (idempotent — a re-delivered trace event never re-runs the judge).
- When your monthly classification allowance is exhausted, tagging is skipped silently and a single "quota reached" nudge is recorded — never an error.
A cheap embedding tier classifies confident cases for free (no judge call). Low-confidence cases escalate to the LLM judge, and a low-confidence judge verdict triggers a 3-vote ensemble (majority wins, ties stay unclassified) — each vote counts as one mast_classification unit (so an ensemble run is 3 units).
The taxonomy is based on the MAST paper (Multi-Agent System Failure Taxonomy).