1System drift
The tell: the output stops matching your tokens and conventions while still looking finished.
The guard: the design system is the source of truth, not the latest generation. Diff against it.
AI does not fail loudly. It drifts quietly toward what looks finished. The output reads with confidence, lands on the page clean, and is subtly wrong. That single fact decides the whole method. The machine generates and proposes without end. One role cannot be handed to it: the person who holds the intent fixed and makes the final call.
Fix the intent up front and restate it, so there is a target to hold the work against.
Encode the system in personas, tokens, and an operating standard, so drift is detectable, not a matter of taste.
Nothing ships without clearing the gate. The model never gets the last word.
The work goes out under my name, with my judgment on it, not the model's.
A review pass, drift caught before it ships
Generation got cheap. The moment it did, my value moved off the keyboard and onto the things a model cannot own: the standard the work is measured against, the choice of which option is actually good, and the guarantee that what ships is true. This playbook is the operating system I run on top of every tool, so that a plain instruction produces work that matches my system instead of work that merely looks like it does.
A vague warning to "stay in control" reads like every other note on AI. What makes control real is recognizing the exact way the work wanders. Drift is not one failure. It is seven, and each has a tell you can learn to see. Once it has a name, you can build a guard for it, and the human becomes the fixed point the whole system turns around.
The tell: the output stops matching your tokens and conventions while still looking finished.
The guard: the design system is the source of truth, not the latest generation. Diff against it.
The tell: it solves the prompt and quietly forgets the problem the prompt was for.
The guard: a written problem statement you restate, so there is something to measure the answer against.
The tell: it states invented file paths, metrics, and APIs in the same calm tone it uses for facts.
The guard: every claim traces to a real source. Nothing is true because it was said fluently.
The tell: it mirrors your framing and stops giving you the second opinion you asked it for.
The guard: ask it to argue the other side, then judge the case, not the agreement.
The tell: it converges on the most common pattern and sands off the choice that made the work yours.
The guard: taste is the input the model does not have. You supply the distinctive call, every time.
The tell: over a long session it reintroduces decisions you already killed.
The guard: decisions live in a written standard, not in the chat. The chat is not a record.
The tell: every single step passes review, and the sum of them walks somewhere you never approved.
The guard: review the whole, not only the diff. Step back at the end and check the work against the original intent.
The difference between a clever one off and a repeatable practice is the context layer. Before the work starts, the machine is handed three things. This is what turns "make me a case study page" into output that already knows my conventions, my voice, and my bar.
This is the complete arc, not a slice of it. The front end frames raw intent. The back end measures what shipped. At every phase the split is the same: what I own, what the machine does, the drift that bites hardest here, and the guard that catches it. The other playbooks plug in where named.
| Phase | You own | The machine does | Sharpest drift | The guard |
|---|---|---|---|---|
| Frame | The real question | Drafts framings, surfaces unknowns | Intent | Write the one line problem yourself, first |
| Discover | Which signals are real | Widens research, clusters interviews, drafts the forces of progress | Confidence | Every finding traced to a real source |
| Define | The cut | Proposes problem statements and scope options | Average | Name the bet, the success measure, and what is out |
| Design | The system and the taste | Generates options against the library | System | Tokens and components are the source of truth |
| Build | The architecture | Writes the implementation | Accumulation | Small reviewable diffs, explicit paths staged |
| Verify | The gate | Runs the checks and reports | Confidence | The full pipeline runs before anything is called done |
| Ship | The decision to release | Prepares the deploy | Accumulation | A restore point, then a verified green deploy |
| Grow | What to measure | Drafts experiments, reads the analytics | Agreement | Pre register the metric and the kill condition |
This is where "final decision maker" stops being a sentiment and becomes a mechanism. Nothing the model produces reaches the live site until it clears this. The gate is run, not vibed. When it is green, I sign. When it is not, the work goes back, not out.
An operating model is also a set of edges. There are places the machine should not be in the loop at all, and naming them is part of the discipline.
AI drifts toward plausible. The human holds the line on correct.
System. Intent. Confidence. Agreement. Average. Memory. Accumulation.
Guide the intent. Encode the standard. Make the final call.
Dashes, structure, machine readability, render, contrast. Green, then signed.
A minimal version of the briefing layer and the gate, written so you can adapt it to your own stack. The persona sets the role. The pre ship checklist is the gate in plain text.
# brief.md (pin intent before any generation)
Problem (one line): ____________________________________
Definition of done: ___________________________________
Hard constraints: ___________________________________
Out of scope: ___________________________________
# persona.md (the role the model starts inside)
You are a [discipline] specialist. You hold the standards of
that craft. You propose, the human disposes. You cite real
sources only. You never present a guess in the voice of a fact.
# preship.md (the gate, run not vibed)
[ ] dash sweep clean (entities included)
[ ] no duplicate ids, tags balanced, anchors resolve
[ ] JSON-LD parses, inline scripts pass a syntax check
[ ] page boots in a smoke test, render checked by eye
[ ] color pairs measured, WCAG AA confirmed
[ ] whole reviewed against original intent, then signed