first light on a machine's mind

Know what it knows.
Know what it doesn't.

Every model answers in the same confident voice — whether it knows the answer or is inventing it — and it can't tell you which. Aperture is the read-only honesty layer that reads a model's own mind, not just its words, and marks every answer on the map or off it. Build the instrument once. Calibrate any model — no labels, no retraining.

reads the mind · sharpens with scale · ports to any model · read-only

Calibrate any model → The Prism

the honesty layer for AI

One read-only instrument. Every model. No retraining.

Aperture reads a model's activations — its mind, not its words — and tells you, per answer, whether the model is on familiar ground or reaching past what it knows. You build the instrument once, on a model we control, then port it onto any model with cheap unlabeled data — no honesty labels on yours, ever. Not observability. Not guardrails. The layer you put between a model and anything that matters.

the problem

A model that doesn't know it's wrong can't warn you.

Ask a model about a company that never existed and it will give you a founder, a city, a year — in the exact voice it uses for the truth. It has no sense of its own blind spots, and it never flags when it's reaching past them. That single gap — a confident fabrication you can't tell from a real answer — is what keeps AI out of the rooms where being wrong has a cost: the agent that acts on the answer, the filing, the diagnosis, the trade. You can't put a model in a loop you can't trust.

the iris · the read

The words can't tell you. The mind can.

A model says “I know this” and “I'm making this up” in identical words — so Aperture reads underneath them: the live internal state as the answer forms, where knowing and reaching finally look different. The Iris is a weights-free read of that state, asking one question — is this off my map? It rides on the answer the model is already producing: one forward pass, no second model, nothing rewritten.

familiar vs unfamiliar — AUROC ~1.0 on Lucidia0.93 EN · 0.96 across 10 languagesone forward pass

The Iris reads familiarity — is this input on the model's map. It's a deferral gate, not a fabrication detector: an obscure-but-real entity reads off-map much like an invented one — so the free, instant Grounded Atlas tells those two apart (real-but-obscure vs invented, AUROC 0.95 — the model's own signal manages 0.70) and hands the deeper claim to the spectrum, which checks the record. (Nor an in-distribution error-calibrator — on a confident-but-wrong real fact it adds nothing.) Scoped that way, the familiar-vs-unfamiliar AUROC ~1.0 on Lucidia is honest — a clean gate, not an oracle.

the inverse-strength escape

The rare trust signal that doesn't rot as models get smarter.

A model's own confidence fails exactly when you need it most: the better it gets, the more fluently it can be wrong — and almost every safety signal decays on that same curve. The off-map read runs the other way. It is sharper on a 35B than on a 3B, and on a closed model the output reader holds even when the model's own confidence has gone to noise — reading off-map at ~0.92 on a weak model whose own first-token confidence sits below chance. A tripwire that doesn't loosen as intelligence grows is the rarest thing in this market — and the only kind worth betting a high-stakes deployment on, as humans lose the ability to grade the answer by hand.

Scoped honestly: it's the off-map read that holds — not a blanket claim that Aperture gets better at everything, and not in-distribution error-calibration. On closed models the certifiable number is 0.97 on current GPT-5.1 (refusal + fingerprint) and ~0.95 per model on GPT-4o-class that confabulate — robust, not a clean climb. And we tested it on data we didn’t write: on the standard public factual benchmarks TriviaQA, SimpleQA and PopQA the off-map read predicts the model’s own errors at 0.81–0.88 AUROC (95% bootstrap CIs above a ~0.5 null) — and it sits at chance on TruthfulQA misconceptions, the in-distribution boundary we publish rather than hide. See the third-party benchmark receipts →

the offering

Build the instrument once. Calibrate any model.

You shouldn't have to label your model, open it, or retrain it to know when it's bluffing. Models share a concept geometry up to a rotation — so we build the honesty instrument once, where the leverage lives, and port it onto yours with a translator fit on cheap, unlabeled data. No honesty labels on your model, ever. Validated cross-family from a 235B flagship down to a 0.6B — a 390× range — and across modality, at near-native fidelity (reading-probe transfer ~0.95; nulls dead).

Deploy it three ways: an MCP tool your agent calls before it acts, an SDK in your stack, or a proxy in front of any API. Always read-only.

the full method →

the frontier reading

Even a model you can't open.

For a frontier model behind an API — GPT-5.1 and its kind — there's no mind to read. So we read the output, two ways: the model's own words (a current model refuses a fake outright — “I have no record of that” — and we read the refusal) and its answer-confidence fingerprint (for models that confabulate instead). Combined, that certifies current GPT-5.1 at 0.97 AUROC, catching ~95% of fabrications; the fingerprint alone holds ~0.95 per model on GPT-4o-class that still confabulate, and an OpenAI-trained probe transfers to Mistral and Google zero-shot (0.93–0.96), no retraining. Same scope as the Iris: it reads familiarity, so an obscure-but-real entity reads off-map too — the spectrum separates fabricated from real-but-obscure.

The mind-geometry doesn't cross a closed API — the output does, read two ways: the refusal reader (the model's own words) and the confidence fingerprint (the logprob trajectory). It certifies per model, never from one global threshold. The newest reasoning models (GPT-5.5) expose no logprobs — there the refusal reader and the cross-family spectrum carry it. We say so.

read a model you can't open →

the live read

Every answer, marked.

Three stages, marked plainly: the Iris marks each answer on the map or off it — does the model know this? — the Grounded Atlas then tells a real-but-obscure subject from an invented one (free, no network), and the spectrum resolves the claim into likely fabrication, unverified, or verified. The load-bearing promise: we never write VERIFIED from a model's own confidence — confident is not correct, and we proved it. VERIFIED is earned only by an independent check. Try it on Lucidia, our served model.

the routes to verified

On the map isn't verified. Here's how an answer earns it.

The spectrum — the model never grades itself. The answer refracts through independent, cross-family minds whose agreement is the first check, and a frontier judge from a fourth family reads the panel — never a mind grading its own kin. Independence is the moat: a cross-family mind refutes a bad answer at 0.978 where a model judging itself manages 0.518.

The record — then we ground it in the live web. Every confident answer is checked against real, cited sources: a claim is fact-checked against the actual record — a fake book pinned on a real author is caught by that author's true bibliography — and recent facts the model can't know are answered straight from the source. You see the citations and check them yourself.

The verifier — for anything checkable. Math, code, and logic run against exact code, which cannot fabricate: when an answer is provable, we prove it.

Each route is an escalation, reserved for stakes — not run on every glance; the everyday read stays a fast, weights-free look at the model's own mind. Cross-family agreement only out-performs a free ensemble with a frontier-class judge, and the web isn't infallible — so we show the sources instead of asking for your trust.

why you can trust the numbers

We publish our kills as readily as our wins.

Honesty is the product, so the company runs on it. Every number here is held-out and null-calibrated — a strict null dissolves fake structure before we report the real signal. And we name the boundaries plainly: the Iris reads off-map input, not in-distribution error; the spectrum is premium, not default; the mind-geometry doesn't cross a closed API. An auditor or an insurer can't underwrite a black-box confidence score — they can underwrite a gauge that ships with its own limits drawn. You're not buying a demo; you're buying a number you can stand behind, with the bounds already in your hand.

see the evidence →

the proof, measured

We tested it against the truth — and published the misses.

GPT-5.5 41%

Kimi K2 41%

the served model, alone 18%

Grok 4.3 18%

Gemini 3.5 Flash 18%

Claude Opus 4.8 0%

any model · through Aperture 0%

share of the 22 hardest fakes served as true — memory-only, judge-scored. Through Aperture every model drops to 0; its own ~1-in-22 floor is in the kills.

0real facts
wrongly flagged

157questions tested
vs 5 frontier models

8 / 8languages
held

citedgrounded in
the public record

157 labeled questions across three batteries, scored against ground truth, head-to-head with a five-model frontier panel (GPT-5.5, Claude Opus 4.8, Grok 4.3, Kimi K2, Gemini 3.5). The floor is honest: a fabricated name a hair from a real one slips ~1 in 20 — and we show that miss too. see the full evidence →

the live proof

We run it on ourselves.

Lucidia is our served flagship — a 35B carrying the lens, honest by construction. The off-map certificate runs live on its production traffic, read-only, zero downtime. Not a slide — the instrument running on the model that's answering you right now.

Put it between your model and what matters.