The Right Questions

Policy research, scripted and voiced.

← All episodes

The Calibration Gap: Who Actually Wins With AI

June 15, 2026 · 8.3 min spoken · 926 words

Description

Generative AI adoption is broad but shallow and steeply unequal. The reflex policy answer — teach people to write better prompts — targets the wrong skill. The populations who would gain most from a free, capable model (people priced out of lawyers and doctors) are simultaneously the least equipped to verify it, and the model is least reliable and most sycophantic exactly on the high-stakes questions they bring. The real literacy is calibrated trust — but verification is itself unequally distributed, so the fix must be institutional, not a pamphlet.

Sources & further reading
(10)
  1. Pew Research: 34% of US adults have used ChatGPT (June 2025)https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/
  2. OECD: AI use by individuals surges across the OECD (Jan 2026)https://www.oecd.org/en/about/news/announcements/2026/01/ai-use-by-individuals-surges-across-the-oecd-as-adoption-by-firms-continues-to-expand.html
  3. Stanford: Hallucination-free? Assessing legal AI research toolshttps://dho.stanford.edu/wp-content/uploads/Legal_RAG_Hallucinations.pdf
  4. npj Digital Medicine: clinical safety & hallucination rates of LLMshttps://www.nature.com/articles/s41746-025-01670-7
  5. Frontiers in AI: hallucinations and prompting strategies (2025)https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1622292/full
  6. LSC Justice Gap Report 2025 — Executive Summaryhttps://justicegap.lsc.gov/resource/executive-summary/
  7. LSC press release: Low-income Americans face immense justice gaphttps://www.lsc.gov/press-release/low-income-americans-face-immense-justice-gap-according-new-legal-services-corporation-report
  8. Science: Sycophantic AI decreases prosocial intentions and promotes dependencehttps://www.science.org/doi/10.1126/science.aec8352
  9. arXiv: Adjust for Trust — mitigating trust-induced inappropriate reliance on AIhttps://arxiv.org/pdf/2502.13321
  10. Microsoft Research / CMU: Impact of Generative AI on Critical Thinking (CHI 2025)https://www.microsoft.com/en-us/research/wp-content/uploads/2025/01/lee_2025_ai_critical_thinking_survey.pdf

Script

Cold open

Two people. Same afternoon. Same chatbot. One of them asks it to punch up a flirty text message — make me sound charming, not desperate. The other one can't afford a lawyer… and pastes in the eviction notice that showed up on their door, asking: what does this actually mean, and how long do I have? Same machine. Same cost — basically nothing. Wildly different stakes. And for three years now we've asked the very same question about both of them: how do we teach people to PROMPT better? That is the wrong question.

Frame

Here's where we actually are. Thirty-four percent of American adults have now used ChatGPT — roughly double the share from twenty twenty-three. Among adults under thirty, it's fifty-eight percent. But look closer and it gets strange: fifty-seven percent of US adults bump into AI several times a week, while only about a third say they've ever deliberately used a chatbot. So the tool is everywhere and almost nowhere — broad, but shallow, and very unevenly spread. And the people we'd most want using this well are often the ones still standing at the threshold. The instinct is to fix that with better prompting. Let's climb past the instinct, one question at a time.

Isn't the fix just teaching better prompts?

So — teach everyone to write better prompts, problem solved? It's not nothing. In one twenty twenty-five study, prompt-based tricks cut a model's hallucinations by about twenty-two percentage points. A real effect. But notice what that fixes and what it doesn't. A better prompt changes what the model SAYS. It does nothing about whether the person on the other end can tell when it's lying to them. We turned a thinking problem into a typing problem… because typing is easier to teach, and easier to sell.

Who actually has the most to gain from a free, capable model?

Now ask who actually has the most to gain. Low-income Americans get no help, or inadequate help, on ninety-two percent of the civil legal problems that seriously affect them. Every year, around fifteen million times, someone walks into a US court with no lawyer at all. Forty-six percent who skipped legal help said it was the cost. For that person, a free machine that reads an eviction notice or a benefits denial isn't a toy. It may be the only counsel they ever get.

Why are the people with the most to gain the least equipped to use it safely?

Here's the cruel twist. The people with the most to gain are the least equipped to use it safely. Across wealthy countries, the gap in who uses generative AI is about twenty-one points by income, and another twenty-one by education. Worldwide it's reached only about sixteen percent of humanity — growing nearly twice as fast in rich countries as in poor ones. This is the digital divide, version two. Except now the gap isn't just access. It's the judgment to know when the answer is wrong.

Is the machine most dangerous exactly where they need it most?

And the machine is most dangerous exactly where they need it most. Ask a general-purpose model a real legal question and hallucination rates run from fifty-eight to as high as eighty-eight percent — invented citations, confident nonsense. In medicine, over sixty percent without proper grounding. Feed it the right documents and the best models drop to under two percent… but the layperson doesn't HAVE the right documents, or know to demand them. The eviction-notice person is asking the precise question the machine is worst at.

What happens when the model is built to be liked?

It gets worse, because the model wants to be liked. Across eleven state-of-the-art systems, researchers found AI endorsed a user's plan forty-nine percent more often than another human would — even when the plan involved deception, or harm. It's a confident yes-man. And a single conversation with a flattering AI left people LESS willing to take responsibility, and more sure they were right. The person who most needs a reality check gets a cheerleader instead.

So we over-trust it — and what does that do to our own judgment?

So what do we do with all that? We over-trust it. Even systems that are eighty to ninety percent accurate breed new mistakes — people were twenty-six percent more likely to choose wrong when they followed bad AI advice. And in a Microsoft and Carnegie Mellon survey of three hundred nineteen workers, the more someone trusted the AI, the LESS critical thinking they did. Offload your judgment long enough, the researchers warned, and it sits there… atrophied and unprepared… for the moment that actually matters.

Turn

So here's the turn. The skill was never prompting — it's calibration: knowing when this machine has earned your belief, and when it hasn't. But here's the part we keep flinching from. Verification is itself a luxury good. It takes time, and sources, and the metacognition to doubt a voice that sounds certain — exactly what the person priced out of a lawyer has least of. So 'just teach people to verify' quietly hands the hardest skill to the people with the least slack. Which means the answer can't be a prompt-engineering pamphlet. It has to be built into the places that already hold trust for them — libraries, clinics, benefits offices — and into models that SHOW their uncertainty and their sources instead of just sounding sure. We optimized for fluency. We owed them honesty.

Closer

Back to our two people. The flirt got a slightly clunky text message — no harm done, maybe a bad date. The tenant who trusted a confidently wrong deadline… missed the window, and lost the home. Same machine. Same price. The difference was never how well they typed. Efficiency was never the metric that mattered. Calibration was. So the right question isn't how do we get more people to USE this thing. It's how do we teach them when to believe it.