Verify what AI claims

Ground: read the answer, but see what's actually supported

Trace every claim. Show confidence and freshness. Flag what nothing backs up.

A fluent paragraph reads as one confident thing, but it's really a mix: some sentences are well-sourced, some rest on a single stale page, some are contested, and some are invented outright. Prose flattens all of that into the same calm voice, and fluency gets mistaken for accuracy. Ground is an exploration of putting provenance on the surface: render each claim as a traceable span, attach the sources, confidence, and freshness behind it, flag the claims with no support at all, and show source conflicts side by side instead of averaging them into a false consensus.

Open the full prototype See how conflicts are handled

Fully interactive. Click each highlighted claim to inspect its evidence, hedge the unsupported one, and resolve the source conflict, then watch the verifiability score move. Built on the Minia design system. Opens in desktop by default; use the toggle for the mobile layout. Data is synthetic.

The problem: fluency is not evidence

A generated answer reads as one confident voice. Underneath, every sentence sits on a different footing, and the interface hides which is which.

A hallucination doesn't arrive flagged. It arrives in the same even, well-formed prose as the sentence next to it that's backed by two solid sources. With nothing to distinguish them, people do the rational thing and trust the fluency, right up until a confident, unsourced number turns out to be invented. The fix isn't a disclaimer at the bottom; it's making the footing of each claim visible at the point you read it.

	Fluent answer no provenance	Grounded answer every claim traceable
Where claims come from	Unknown: sources, if any, are out of view	Each claim links to the sources behind it
Confidence	One uniform tone for everything	A per claim meter, strong vs shaky is visible
Freshness	No sense of whether a source is current or years old	“Updated 3 days ago” vs “16 months · stale” on the surface
Unsupported claims	Read exactly like the supported ones	Flagged “no source found,” one click to hedge
Conflicting sources	Averaged into one smooth, false consensus	Shown side by side, you pick or flag

The thesis: make provenance a property of every claim

Grounding isn't a label on the whole answer. It's an attribute each individual claim either has or doesn't, and the UI should carry it claim by claim.

Ground treats the answer as a set of claims, not a block of text. Each one is a span you can select to see exactly what's behind it: the sources, how current they are, and a confidence read that reflects both. A well-supported claim shows its receipts; a shaky one shows its thin support; an invented one shows nothing, and says so. The reader's trust gets to track the evidence instead of the prose.

A claim traced to its sources, rendered live in the Minia design system, the same theme as the prototype above.

Show the spread, not an average

A single trust score for a whole answer is its own kind of lie. Ground shows confidence and freshness per claim, so the strong and the shaky never blur together.

Two claims in the same sentence can be worlds apart: one anchored to a source updated days ago, the next leaning on a page that's sixteen months stale. Collapse them into one number and you've thrown away the only thing that matters. Ground keeps them separate, a confidence meter and a freshness badge on each claim, so the spread is visible. A high overall score with one rotten claim is exactly the situation the design refuses to hide.

Confidence and freshness, claim by claim, rendered live in the Minia design system, the same theme as the prototype above.

Show the disagreement, don't smooth it

When credible sources conflict, the helpful-sounding move is to average them into one confident sentence. That's the move Ground refuses.

Real questions often don't have a settled answer, and a model that flattens two opposing sources into a single claim manufactures a consensus that doesn't exist. Ground surfaces the conflict as a conflict: the disagreeing sources shown side by side, under a banner that names the tension, with the choice left to the person, trust one, or keep it flagged. Naming the disagreement is the design; it's more honest than any averaged number could be.

A conflict shown, not averaged, rendered live in the Minia design system, the same theme as the prototype above.

The payoff is calibrated trust

When provenance is visible, belief can track evidence, you trust the well sourced claims, doubt the thin ones, and catch the invented one before it travels.

Try it in the live prototype above: click the claim with no source and hedge it, resolve the conflict by picking a side, and watch the verifiability score climb as the answer gets more honest. Flip on Highlight unsupported only to strip away everything that's already solid and stare at just the risk. The review log keeps a record of every inspection, so a verdict is reproducible and a missed claim is diagnosable.

How it got here: v1 → v6

Grounding didn't start as a rich evidence panel. Each version moved verification closer to the claim, from a footnote nobody read to provenance you can see at a glance.

It began as a list of links at the bottom and ended as a property of every sentence. Each step shortened the distance between a claim and its evidence, until confidence, freshness, and conflict all lived on the claim itself, and an unsupported sentence could no longer hide among the sourced ones.

1
A list of sources at the end

Links dumped under the answer. Technically cited, practically unread, and unmapped to any specific claim.
2
+ Inline citation markers

Footnote numbers in the text. Better, but a number can't tell you whether the source actually says what the sentence claims.
3
+ Claim-level highlighting

Each claim became a selectable span with its own evidence. Provenance moved from the footer onto the sentence.
4
+ Confidence & freshness

Added a per-claim meter and an age badge, so a stale single source stopped looking as solid as a fresh, corroborated one.
5
+ Unsupported flags

Claims with no source got a rose “no source found” flag and a one click hedge. Invented sentences could no longer pass as fact.
6
+ Conflict surfacing & score · current

Disagreeing sources shown side by side, plus a verifiability score for the whole answer, honest about what's solid and what isn't.

Explore more work

More explorations from the AI Product Design Lab, each a different facet of making AI products people can direct, verify, supervise, and trust.

Steer, intent before generation

Turn an under-specified prompt into a negotiated brief: surface the assumptions, name the ambiguity, and decide before the model commits.

View exploration

Oversee, safe agent autonomy

A control surface for agents that take real actions: scope it, preview it with a dry run, interrupt it mid-task, and undo what's reversible.

View exploration

Recall, legible AI memory

A memory layer you can see, attribute, edit, scope, and revoke. Personalization as a negotiated, inspectable thing, not a black box.

View exploration

Explore the AI Product Design Lab