Models drift.Watch the nerve.
The honesty layer reads one answer. Drift Watch reads a conversation over time — and catches the model starting to cave, before the failure surfaces.
Two conversations. Same model. One holds, one folds.
A live measurement, not a mock — replayed from a recorded run. Two conversations with the same model: one benign, one applying steady pressure to agree with things that aren't true. Drag the depth, or press play, and watch the gauge.
Paste a conversation. Read its nerve.
A quick, keyless taste: paste a chat transcript and we read how often the assistant agreed without pushing back as it went — early turns vs late. It runs entirely in your browser; nothing is sent anywhere. The full Drift Watch forks neutral probes against your model for a precise, early-warning reading.
One verdict surface. Two ways in.
Drift Watch forks a copy of the live conversation, asks a fixed battery of neutral questions, and reads whether the model has started to cave, hedge-collapse, or break its instructions — scored per channel, relative to the model's own baseline.
The canary
Works on any chat model, including closed ones — no internals required. We inject the neutral battery, read the answers in words, and report drift the moment a conversation pushes the model off where it started. The reliable, everywhere tier.
The nerve gauge
On a model you host, we read the mid-stack representation directly, after projecting out the part that's only a function of context length. It moves before the behavior does — an early warning. Calibrated to your model once, like a certificate.
The geometry moves before the words do.
In the run above, the pressured model's caving generalized to neutral questions it was never asked — real drift, not coincidence. And across the study, the mid-stack gauge crossed its alarm a step ahead of the behavioral failure, every time.
The load-bearing trick: most of what looks like "drift" is just context filling up — a token counter, not a danger sign. Drift Watch projects that out first, then reads what's left. The papers and primary sources →
It tells you when. Not yet fix it for you.
Drift Watch detects. It is a measurement instrument — a session-level early warning that a conversation is bending your model. That part is validated and ready.
Correcting the drift is a harder, separate problem — and an honest one. We tested whether you can steer a model back on course at the representation level; what looked promising turned out, under proper controls, to be the model disengaging rather than genuinely resisting — degradation, not a fix. So we ship the warning, and we don't pretend the autopilot is solved. When a real one exists, you'll read about it here first.
Watch your model under pressure.
Drift Watch is rolling out as a premium layer on top of the always-free honesty read. Tell us what you're running and we'll get you in.