AI operating model

The model drifts toward plausible. You hold the line on correct.

AI does not fail loudly. It drifts quietly toward what looks finished. The output reads with confidence, lands on the page clean, and is subtly wrong. That single fact decides the whole method. The machine generates and proposes without end. One role cannot be handed to it: the person who holds the intent fixed and makes the final call.

Read the system All playbooks

01Guide

Fix the intent up front and restate it, so there is a target to hold the work against.

02Standard

Encode the system in personas, tokens, and an operating standard, so drift is detectable, not a matter of taste.

03Final call

Nothing ships without clearing the gate. The model never gets the last word.

04Own it

The work goes out under my name, with my judgment on it, not the model's.

A review pass, drift caught before it ships

specimen

draftagent output, pass 1 accepted? noreview

flagcontrol renamed Save then Submit system drift

flagcites event signup_done no such event

flagstates a 40 percent lift no source

flagaccent fails AA on surface 3.9 to 1

fixone name kept, claim cut, color repaired accepted? yessigned

Playbook spec

Domain: AI operating model
Scope: Whole practice
Pairs with: The Lab
Form: Standard

00 · The thesis

The machine executes. I own the intent, the context, and the judgment that makes it right.

Generation got cheap. The moment it did, my value moved off the keyboard and onto the things a model cannot own: the standard the work is measured against, the choice of which option is actually good, and the guarantee that what ships is true. This playbook is the operating system I run on top of every tool, so that a plain instruction produces work that matches my system instead of work that merely looks like it does.

One thing this is not. The AI Product Design Lab is how I design AI products other people can trust. This is how I work with AI myself, to produce everything else in this portfolio. Same belief, two different objects. The Lab faces the user. This faces the work.

01 · How AI drifts

Name the drift, and the human stops being a disclaimer.

A vague warning to "stay in control" reads like every other note on AI. What makes control real is recognizing the exact way the work wanders. Drift is not one failure. It is seven, and each has a tell you can learn to see. Once it has a name, you can build a guard for it, and the human becomes the fixed point the whole system turns around.

1System drift

The tell: the output stops matching your tokens and conventions while still looking finished.

The guard: the design system is the source of truth, not the latest generation. Diff against it.

2Intent drift

The tell: it solves the prompt and quietly forgets the problem the prompt was for.

The guard: a written problem statement you restate, so there is something to measure the answer against.

3Confidence drift

The tell: it states invented file paths, metrics, and APIs in the same calm tone it uses for facts.

The guard: every claim traces to a real source. Nothing is true because it was said fluently.

4Agreement drift

The tell: it mirrors your framing and stops giving you the second opinion you asked it for.

The guard: ask it to argue the other side, then judge the case, not the agreement.

5Average drift

The tell: it converges on the most common pattern and sands off the choice that made the work yours.

The guard: taste is the input the model does not have. You supply the distinctive call, every time.

6Memory drift

The tell: over a long session it reintroduces decisions you already killed.

The guard: decisions live in a written standard, not in the chat. The chat is not a record.

7Accumulation drift

The tell: every single step passes review, and the sum of them walks somewhere you never approved.

The guard: review the whole, not only the diff. Step back at the end and check the work against the original intent.

The pattern under all seven. The model optimizes for plausible. You optimize for correct. The gap between those two is exactly the work that stays human.

02 · The rules it rests on

Six rules everything else descends from.

Intent is the artifact you own. The prompt is just its serialization. If the intent is fuzzy, no amount of prompting saves the output.
Context is the work. A plain instruction only produces on system results when the environment around it was built for that. You build the environment.
The model proposes, you dispose. Generation explores width. Selection is the leverage. Choosing well is the job.
Verify before you trust. Fluency is not evidence. Every generated artifact clears a gate before it counts as done.
Keep a human legible trail. Document the reasoning, not only the output, so the next reader, human or agent, can follow the why.
Taste does not transfer. The model can match a standard once you set it. It cannot originate the standard. That part stays with you.

03 · Briefing the machine

I do not prompt from a blank box. I brief from a system.

The difference between a clever one off and a repeatable practice is the context layer. Before the work starts, the machine is handed three things. This is what turns "make me a case study page" into output that already knows my conventions, my voice, and my bar.

Role specific system prompts. A library of personas, one per discipline: architecture, full stack build, content, growth, discovery. Each one carries the standards of that craft so the model starts inside the role, not outside it.
A project operating standard. A portable document that travels in the repo root. It sets the git hygiene, the directory shape, the backup discipline, and the session handoff rules, so any machine, mine or the model's, resumes cleanly.
A precise brief. The one line problem, the constraints that are not negotiable, and the definition of done. Short, but exact. The brief is where intent gets pinned before a single token is generated.

Why it matters. Context is not preamble. It is the part of the work that decides whether a plain instruction lands on system or drifts off it. Build the context once, and the same instruction produces better work every time you reuse it.

04 · The workflow, phase by phase

From a vague request to a live, growing product, eight phases.

This is the complete arc, not a slice of it. The front end frames raw intent. The back end measures what shipped. At every phase the split is the same: what I own, what the machine does, the drift that bites hardest here, and the guard that catches it. The other playbooks plug in where named.

Phase	You own	The machine does	Sharpest drift	The guard
Frame	The real question	Drafts framings, surfaces unknowns	Intent	Write the one line problem yourself, first
Discover	Which signals are real	Widens research, clusters interviews, drafts the forces of progress	Confidence	Every finding traced to a real source
Define	The cut	Proposes problem statements and scope options	Average	Name the bet, the success measure, and what is out
Design	The system and the taste	Generates options against the library	System	Tokens and components are the source of truth
Build	The architecture	Writes the implementation	Accumulation	Small reviewable diffs, explicit paths staged
Verify	The gate	Runs the checks and reports	Confidence	The full pipeline runs before anything is called done
Ship	The decision to release	Prepares the deploy	Accumulation	A restore point, then a verified green deploy
Grow	What to measure	Drafts experiments, reads the analytics	Agreement	Pre register the metric and the kill condition

Discover and Define run on my Discovery to Scope playbook: Jobs to be Done to find the real job, the Double Diamond to widen then narrow.
Design runs inside my Figma Operating System: the agent generates in the file, the naming, tokens, and status keep it on system.
Build applies the Interface Content System to every string the machine writes, so the copy stays in the customer's voice.

05 · The verification gate

The gate the machine never gets past.

This is where "final decision maker" stops being a sentiment and becomes a mechanism. Nothing the model produces reaches the live site until it clears this. The gate is run, not vibed. When it is green, I sign. When it is not, the work goes back, not out.

Dash sweep. Zero em dashes, en dashes, or dash as punctuation, including HTML entities that render as a dash. Prose hyphen check runs on the stripped visible text.
Structure scan. Duplicate id check, tag balance across div, section, article, and nav, and every in page anchor resolves.
Machine readability. The JSON-LD parses, and node checks every inline script block for syntax.
Render proof. A jsdom smoke test with the right polyfills boots the page, and the rendered output is inspected by eye.
Contrast proof. Color pairs are measured with exact oklab math and confirmed at WCAG AA before any color change ships.

The point of the gate. It converts trust into evidence. The model's confidence is not admissible. Only a passed check is.

06 · Where this is not the tool

Reach for judgment, not generation.

An operating model is also a set of edges. There are places the machine should not be in the loop at all, and naming them is part of the discipline.

When being subtly wrong is unbounded. Legal, medical, financial, and security calls carry a cost that plausible cannot be allowed to touch. The human decides, with a qualified human.
When you cannot specify the intent yet. If the thinking is the point, do the thinking. A model handed a vague intent returns a confident average, and the average is the thing you were trying to escape.
When the work is the relationship. A hard conversation or a judgment about a person is not a generation task. Outsourcing it is the cost, not the saving.
When you would ship what you cannot verify. If there is no gate for it, there is no green for it. Do not ship on faith.

07 · The model on one screen

The whole thing, at a glance.

The one idea

AI drifts toward plausible. The human holds the line on correct.

The seven drifts

System. Intent. Confidence. Agreement. Average. Memory. Accumulation.

The three moves

Guide the intent. Encode the standard. Make the final call.

The gate

Dashes, structure, machine readability, render, contrast. Green, then signed.

A · Starter scaffold

Drop this in and you are running the model.

A minimal version of the briefing layer and the gate, written so you can adapt it to your own stack. The persona sets the role. The pre ship checklist is the gate in plain text.

# brief.md  (pin intent before any generation)
Problem (one line): ____________________________________
Definition of done: ___________________________________
Hard constraints:   ___________________________________
Out of scope:       ___________________________________

# persona.md  (the role the model starts inside)
You are a [discipline] specialist. You hold the standards of
that craft. You propose, the human disposes. You cite real
sources only. You never present a guess in the voice of a fact.

# preship.md  (the gate, run not vibed)
[ ] dash sweep clean  (entities included)
[ ] no duplicate ids, tags balanced, anchors resolve
[ ] JSON-LD parses, inline scripts pass a syntax check
[ ] page boots in a smoke test, render checked by eye
[ ] color pairs measured, WCAG AA confirmed
[ ] whole reviewed against original intent, then signed

Next: Discovery to Scope See the Lab