WIREFRAME — Criterion · The Scorecard (Screen 3 of the Practice flow)
Desktop-first web app · narrow centred column · Authenticated · Reached from Interview → End & see score ·
No design applied — structure, hierarchy, and copy intent only
Where this sits in the flow
Home›Set up›Interview›Scorecard›Debrief
This is the screen the whole product argues for: an AI judgment you can
read, trust, and talk back to. If a viewer remembers one frame from Criterion, it should be this one.
Stepper header
◎ Criterion
Set up ✓ — Interview ✓ — Score — Debrief
Headline — overall score is PROVISIONAL by design
Your scorecard · Senior PM
6.7 / 10
provisionalopen to challenge
Disagree with any of these? Say so.
Every score below links to the exact words that earned it. If one
feels wrong, challenge it — it'll re-evaluate right here, in the open.
Criterion card — HERO (low-confidence, primed to be challenged)
● provisionalHandling ambiguity
Confidence Medium · weight 15%
5.0
Why this score: you narrowed the scope, but didn't
state the assumption you were making or what would change your mind. (1 quote · Q3)
Evidence (expanded)
Q3
"I'd narrow it to the riskiest segment and move."
▶ 6:18
Only one short quote matched — that's why confidence is Medium, not High.
Confidence is a function of evidence, not a vibe.
Criterion cards — the rest (same anatomy, lighter weight; all still challengeable)
State
Criterion
Conf.
Wt
Score
● scored
Product sense (3 quotes)
High
30%
7.5
● provisional
Structured thinking
Medium
25%
6.0
● scored
Communication
High
20%
8.0
● thin
Self-awareness
Low
10%
6.0
Every row expands to the same "Why + evidence quotes + Challenge"
pattern as the hero card. No score is exempt from being contested.
Integrity banner — always visible
⛨ Roughly 1 in 4 challenges hold or lower the score — a real re-evaluation, not a rubber stamp.
When evidence can't settle it, Criterion returns human review ⚑ rather than faking certainty.
The challenge flow (inline, 4 steps)
Step
What the user sees
1 · Make your case
Reason chips + free-text box
2 · Re-evaluate
Spinner + visible checks against transcript/rubric/confidence
3 · Reasoning
Plain-language explanation of the decision
4 · Outcome
up · hold · lower · human review
Happens in place, inside the card — never a new page. The argument
stays next to the thing being argued about.
Four possible outcomes (all legitimate)
Outcome
Signal
Revised up
e.g. 5 → 7, confidence rises
Held
unchanged, with reasons shown
Lowered
re-reading hurt the case — reasons shown
Human review
evidence can't settle it; defer to a person
"Lower" and "human review" MUST be real, reachable outcomes — or the
whole promise is theatre.
Confidence = evidence (dot states)
Dot
Meaning
● scored
firm, multiple matched quotes
● provisional
matched but thin / medium confidence
● thin
low confidence — "show me more," not "you failed"
Overall score (spec)
total = Σ (criterion_score × weight)
weights set by the user on Set up
label = "provisional" until all criteria firm
a held/revised challenge updates total live
States to cover in Figma
Card collapsed / evidence expanded / challenge open
Re-evaluating (spinner) state
Each of the 4 outcomes
Low-confidence ("thin evidence") card
Evidence → Moment Replay overlay
Wireframe notes — for Figma reference
The overall number is labelled provisional · open to challenge from the moment the screen
loads — the score is framed as the start of a conversation, not a verdict.
Every criterion card carries the same anatomy: state dot, name, confidence + weight, score, a "Why this
score" line, See the evidence, and Challenge. No score is exempt.
Evidence is the user's own words — each quote tagged with its question (Q3) and a transcript timestamp that
opens the Moment Replay overlay. Nothing is scored against something the user can't see.
Confidence is presented as a function of evidence, not a mood: "Low" means thin matched evidence and reads
as "show me more," never "you failed."
The challenge flow runs inline inside the card in four steps (make your case → re-evaluate →
reasoning → outcome) — it never navigates away from the score being contested.
All four outcomes are real and reachable: revise up, hold, lower, or escalate to human review. "Lower" and
"human review" existing is what makes the feature trustworthy rather than a rubber stamp.
The re-evaluation shows its work — visible checks against the transcript, rubric, and confidence threshold —
so the result reads as reasoning, not a dice roll.
The integrity banner ("~1 in 4 challenges hold or lower") sits on the screen permanently, setting honest
expectations before anyone challenges.
A challenge outcome updates the overall total live and is recorded — the contest record later travels with
the score into the Debrief and beyond.
Core principle enforced by this screen: an AI judgment people can read, trust, and talk back to — the score
invites the argument instead of ending it.