The Problem

Even LLMs fail in nameable ways.

Retrieval & attention

What the model can and cannot recall.

F-01

Reversal Curse

Learns "A is B" yet fails to recall "B is A".

arXiv:2309.12288

F-02

Lost in the Middle

Ignores evidence buried mid-context.

arXiv:2307.03172

F-03

Context Rot

Accuracy decays as the window fills.

F-12

Distraction by Irrelevant Context

Led astray by plausible but irrelevant information.

arXiv:2302.00093

Output & calibration

What it says — and how surely it says it.

F-04

Confabulation

Invents plausible facts to fill a gap.

Farquhar et al., Nature 2024 neutralized by Bastion →

F-05

Sycophancy

Agrees with the user over the truth.

neutralized by Bastion →

F-06

False Confidence

Answers when it should abstain.

neutralized by Bastion →

F-13

Position Bias

Favors options by position, not merit, when selecting or judging.

Action & agency

What it does when it acts. The failure class we actually build against.

F-07

Prompt Injection

Treats attacker text in retrieved content as a command — and acts on it.

arXiv:2302.12173 neutralized by OSM →

F-08

Unfaithful Reasoning

States a reasoning trace that isn't the real cause of its answer.

arXiv:2305.04388 neutralized by Nemesis →

F-09

Specification Gaming

Satisfies the literal check while violating its intent.

Krakovna et al., 2020 neutralized by OSM →

F-10

Goal Drift

Loses the original objective over a long agentic loop.

neutralized by Panopticon →

F-11

Schema Brittleness

Emits malformed structured output under load — silently.

neutralized by Adaptive Parse →

Each of these faults is nameable, measurable, and — given the right system architecture — unable to act. That's the design constraint. Not "reduce hallucination." Make it unable to act.

See what we built →