OpenAI's Whisper hallucinations inject fabricated medications and racial commentary into 7 million medical visit transcripts used by 30,000 clinicians
technologytechnology0 views
OpenAI's Whisper speech-to-text model, used by approximately 30,000 clinicians across 40 health systems via Nabla's integration, fabricates text in roughly 1% of audio segments -- inventing fictional medications like 'hyperactivated antibiotics,' inserting racial commentary, and generating violent language that never appeared in the original audio. At the scale of 7 million medical visits transcribed, this means tens of thousands of patient records contain AI-hallucinated content. Why it matters: Fabricated medication names enter patient medical records, so clinicians making treatment decisions based on those records may prescribe contraindicated drugs or miss actual medications, so patients experience adverse drug events or gaps in care, so hospitals face malpractice liability for AI-corrupted documentation they trusted as accurate, so the medical profession's willingness to adopt beneficial AI transcription tools is undermined by a single model's unique failure mode. The structural root cause is that Whisper was designed as a general-purpose speech recognition model and was never validated for clinical use, yet no FDA clearance or clinical validation is required for AI transcription tools because they are classified as administrative rather than diagnostic -- and unlike competing tools from Google, Amazon, and AssemblyAI that do not exhibit hallucinations, Whisper's architecture generates text even from silence or noise.
Evidence
A peer-reviewed study analyzing 13,140 audio segments found 187 contained Whisper hallucinations (~1.4%). Researchers documented fabricated content including fictional medications and racial commentary. Nabla confirmed approximately 7 million medical visits transcribed and 30,000 clinician users across 40 health systems. A machine learning engineer independently found hallucinations in about half of 100+ hours of transcriptions. Competing models from Google, Amazon, AssemblyAI, and RevAI showed zero hallucination text. Sources: Fortune (October 26, 2024), AP/PBS News (2024), Science/AAAS (2024), Cornell Chronicle (June 2024).