Gated
Closed frontier
Most capable, most aligned — but opaque and policy-gated. We use them where the role doesn't require trusting the weights — embeddings, an adversarial second opinion — and gate them everywhere a hallucination could act.
Systems are what we ship. Instruments are the model substrates we study — because the architecture itself is a lever on the faults. The experimental counterpart to the systems.
Uncensored baseline
The unaligned control group. Sycophancy, refusal boundaries, and the ceiling of trainability can't be measured on a model aligned to hide them. We run a low-refusal model as the instrument that exposes those behaviours — and as a probe to map where censored models break.
Attacks: Sycophancy →
Beyond attention
State-space models as a non-transformer substrate — linear-time, with a different recall geometry. Attention's failure modes are partly architectural. We tested pure state-space models rigorously; they didn't beat our hybrid. The working answer in our stack is a hybrid SSM design with causal compression.
Attacks: The reversal curse, lost in the middle →
Token-level diffusion
Generation by global denoising instead of left-to-right next-token prediction. Confabulation is partly a mechanism of autoregression: a committed token can't be revised, so the model must keep going plausibly. Diffusion refines the whole sequence under global constraints, and the denoising trace doubles as a pre-hoc confidence signal. A substrate we're betting on, with early evidence — not a shipped result.
Attacks: Confabulation →The field we work across
Reliability isn't a property of any one model. We build it in the system layer and run it across the whole field — because the faults are universal across every family.
Gated
Most capable, most aligned — but opaque and policy-gated. We use them where the role doesn't require trusting the weights — embeddings, an adversarial second opinion — and gate them everywhere a hallucination could act.
Rising
Closing the capability gap fast, on a different alignment regime — and frequently released open-weight, which puts real capability within reach of inspection.
Open
Inspectable, self-hostable, modifiable. The only substrate we can instrument from the inside — which is why every instrument above is open-weight: you can only study a failure where you can see the weights.
Systems are what we ship; instruments are how we measure. One constraint runs through both: make the failure unable to act.