Failure Taxonomy (MAST)

When a trace fails, Retrace classifies why using the MAST taxonomy (Multi-Agent System failure taxonomy) — 14 failure modes across 3 categories. Classification runs automatically on failed traces (an LLM judge, metered separately from your interactive AI quota) and the result appears as a detection and in the Insights → Failure Taxonomy breakdown.

1. Specification & System Design

The agent violates the task or role it was given, or the system is poorly specified.

Mode	Name	What it means
`FM-1.1`	Disobey Task Specification	The agent ignores or violates the task's stated constraints, format, or requirements.
`FM-1.2`	Disobey Role Specification	The agent acts outside its assigned role, taking on work it was not meant to do.
`FM-1.3`	Step Repetition	The agent needlessly repeats steps it already completed, stalling progress.
`FM-1.4`	Loss of Conversation History	Earlier context is dropped, so the agent forgets prior constraints or decisions.
`FM-1.5`	Unaware of Termination Conditions	The agent does not recognize when the task is complete and should stop.

2. Inter-Agent Misalignment

Agents miscommunicate or fail to coordinate, so collective progress breaks down.

Mode	Name	What it means
`FM-2.1`	Conversation Reset	The dialogue unexpectedly restarts, discarding accumulated progress.
`FM-2.2`	Fail to Ask for Clarification	The agent proceeds on ambiguous input instead of asking for clarification.
`FM-2.3`	Task Derailment	The agent drifts away from the original objective onto an unrelated path.
`FM-2.4`	Information Withholding	An agent fails to share information that other agents need to proceed.
`FM-2.5`	Ignored Other Agent's Input	The agent disregards a relevant contribution from another agent.
`FM-2.6`	Reasoning–Action Mismatch	The agent's stated reasoning or plan does not match the action it actually takes.

3. Task Verification & Termination

The result is finalized without correct verification, or the run stops at the wrong time.

Mode	Name	What it means
`FM-3.1`	Premature Termination	The run ends before the task is actually finished.
`FM-3.2`	No or Incomplete Verification	The output is not verified, or only partially, letting errors through.
`FM-3.3`	Incorrect Verification	Verification is performed but reaches the wrong conclusion.

How it works

Only failed traces are classified (cost is bounded to the failure population).
Each classification is metered on the mast_classification plan key — separate from ai_request, so background tagging never eats your interactive AI quota.
One classification per trace (idempotent — a re-delivered trace event never re-runs the judge).
When your monthly classification allowance is exhausted, tagging is skipped silently and a single "quota reached" nudge is recorded — never an error.

A cheap embedding tier classifies confident cases for free (no judge call). Low-confidence cases escalate to the LLM judge, and a low-confidence judge verdict triggers a 3-vote ensemble (majority wins, ties stay unclassified) — each vote counts as one mast_classification unit (so an ensemble run is 3 units).

The taxonomy is based on the MAST paper (Multi-Agent System Failure Taxonomy).

Failure Taxonomy (MAST)