The Impact of AI Transcription on Journalism

Manual transcription of recorded interviews is one of the most time-consuming routine tasks in journalism. A one-hour interview requires an average of four hours to manually transcribe accurately — time that displaces reporting, analysis, and writing. AI transcription has effectively eliminated this bottleneck, with tools like OpenAI Whisper, Otter.ai, Rev.ai, Descript, and Riverside achieving 90–97% word accuracy on clear recordings in English, reducing transcription time to a review-and-correct workflow that takes 20–30 minutes for a one-hour interview.

Leading AI Transcription Tools for Journalists

OpenAI Whisper is an open-source speech recognition model released in 2022 that supports 99 languages and achieves state-of-the-art word error rates on clean audio. It can be run locally (free), making it suitable for newsrooms handling sensitive source recordings that should not be sent to external servers. Word error rate on clear English speech is approximately 3–5%.

Otter.ai is the most widely used journalist transcription tool, offering real-time transcription via phone call and meeting integration (Zoom, Teams, Meet), automatic speaker identification, and AI-generated summaries of transcribed content. It is particularly valued for post-interview quick review and quote identification. Accuracy on clear speech with minimal background noise is comparable to Whisper.

Descript combines transcription with audio/video editing — journalists can edit recordings by editing the transcript text, making it a powerful tool for podcast and multimedia journalists who need both transcription and production capabilities.

Rev.ai offers both AI transcription (fast, lower cost) and human transcription services (slower, higher accuracy) — allowing newsrooms to route sensitive or quality-critical transcriptions to human reviewers while using AI for routine material.

Accuracy Limitations and Verification Requirements

All AI transcription tools make errors, and the errors tend to be systematic rather than random: proper nouns (names of people, places, organisations) are transcribed incorrectly more often than common words; technical vocabulary, accented speech, and overlapping speakers all degrade accuracy; and audio quality issues (background noise, phone compression, room echo) can push word error rates to 15% or higher. These systematic errors are particularly problematic for journalism — a name misspelled in a quote misattributes the statement and may constitute a factual error or defamation risk.

Professional practice requires human review of all AI transcription before use in published quotes, with particular attention to proper nouns, numbers, and any statement that could be legally sensitive.