The Scorecard [WIREFRAME]

Desktop-first web app · narrow centred column · Authenticated · Reached from Interview → End & see score · No design applied — structure, hierarchy, and copy intent only

Stepper header

◎ Criterion

Set up ✓ — Interview ✓ — Score — Debrief

Headline — overall score is PROVISIONAL by design

Your scorecard · Senior PM

6.7 / 10

provisionalopen to challenge

Disagree with any of these? Say so.

Every score below links to the exact words that earned it. If one feels wrong, challenge it — it'll re-evaluate right here, in the open.

Criterion card — HERO (low-confidence, primed to be challenged)

● provisional Handling ambiguity

Confidence Medium · weight 15%

5.0

Why this score: you narrowed the scope, but didn't state the assumption you were making or what would change your mind. (1 quote · Q3)

Evidence (expanded)

"I'd narrow it to the riskiest segment and move."

▶ 6:18

Only one short quote matched — that's why confidence is Medium, not High. Confidence is a function of evidence, not a vibe.

Criterion cards — the rest (same anatomy, lighter weight; all still challengeable)

State	Criterion	Conf.	Wt	Score
● scored	Product sense (3 quotes)	High	30%	7.5
● provisional	Structured thinking	Medium	25%	6.0
● scored	Communication	High	20%	8.0
● thin	Self-awareness	Low	10%	6.0

Every row expands to the same "Why + evidence quotes + Challenge" pattern as the hero card. No score is exempt from being contested.

Integrity banner — always visible

⛨ Roughly 1 in 4 challenges hold or lower the score — a real re-evaluation, not a rubber stamp. When evidence can't settle it, Criterion returns human review ⚑ rather than faking certainty.

The challenge flow (inline, 4 steps)

Step	What the user sees
1 · Make your case	Reason chips + free-text box
2 · Re-evaluate	Spinner + visible checks against transcript/rubric/confidence
3 · Reasoning	Plain-language explanation of the decision
4 · Outcome	up · hold · lower · human review

Happens in place, inside the card — never a new page. The argument stays next to the thing being argued about.

Four possible outcomes (all legitimate)

Outcome	Signal
Revised up	e.g. 5 → 7, confidence rises
Held	unchanged, with reasons shown
Lowered	re-reading hurt the case — reasons shown
Human review	evidence can't settle it; defer to a person

"Lower" and "human review" MUST be real, reachable outcomes — or the whole promise is theatre.

Confidence = evidence (dot states)

Dot	Meaning
● scored	firm, multiple matched quotes
● provisional	matched but thin / medium confidence
● thin	low confidence — "show me more," not "you failed"

Overall score (spec)

total = Σ (criterion_score × weight)
weights set by the user on Set up
label = "provisional" until all criteria firm
a held/revised challenge updates total live

States to cover in Figma

Card collapsed / evidence expanded / challenge open
Re-evaluating (spinner) state
Each of the 4 outcomes
Low-confidence ("thin evidence") card
Evidence → Moment Replay overlay

Wireframe notes — for Figma reference

The overall number is labelled provisional · open to challenge from the moment the screen loads — the score is framed as the start of a conversation, not a verdict.
Every criterion card carries the same anatomy: state dot, name, confidence + weight, score, a "Why this score" line, See the evidence, and Challenge. No score is exempt.
Evidence is the user's own words — each quote tagged with its question (Q3) and a transcript timestamp that opens the Moment Replay overlay. Nothing is scored against something the user can't see.
Confidence is presented as a function of evidence, not a mood: "Low" means thin matched evidence and reads as "show me more," never "you failed."
The challenge flow runs inline inside the card in four steps (make your case → re-evaluate → reasoning → outcome) — it never navigates away from the score being contested.
All four outcomes are real and reachable: revise up, hold, lower, or escalate to human review. "Lower" and "human review" existing is what makes the feature trustworthy rather than a rubber stamp.
The re-evaluation shows its work — visible checks against the transcript, rubric, and confidence threshold — so the result reads as reasoning, not a dice roll.
The integrity banner ("~1 in 4 challenges hold or lower") sits on the screen permanently, setting honest expectations before anyone challenges.
A challenge outcome updates the overall total live and is recorded — the contest record later travels with the score into the Debrief and beyond.
Core principle enforced by this screen: an AI judgment people can read, trust, and talk back to — the score invites the argument instead of ending it.