What Are Deepfakes?
A deepfake is synthetic media — video, audio, image, or text — generated or manipulated using artificial intelligence to represent events or statements that never occurred. The term originated from a Reddit username ("deepfakes") that became associated with AI-generated face-swapping videos in 2017, but has since expanded to cover all forms of AI-generated synthetic media used to deceive.
Deepfakes are produced using generative adversarial networks (GANs), diffusion models (DALL-E, Stable Diffusion, Midjourney), voice cloning systems (ElevenLabs, Resemble AI), and large language models. The technology has advanced so rapidly that as of 2025, photorealistic synthetic video of real individuals making statements they never made can be produced by a skilled practitioner in under an hour.
The Threat to Journalism
Deepfakes represent a qualitative shift in the information manipulation threat landscape for several reasons. First, they exploit human cognitive biases that treat video as more credible than text — people are wired to believe "seeing is believing," and sophisticated deepfakes exploit this at scale. Second, even after a deepfake is debunked, its spread on social media often outpaces corrections (the "liar's dividend" — the existence of deepfakes also enables bad actors to claim real videos are deepfakes). Third, the cost of production has collapsed — what required industrial-scale computational resources in 2019 can be done with consumer hardware in 2025.
How Deepfake Detection Works
Deepfake detection systems use several complementary technical approaches:
- Facial artifact detection: GANs and diffusion models leave subtle but detectable artifacts in generated faces — inconsistent eye reflections, unnatural skin texture, irregular ear geometry, and blurring at hair boundaries. Convolutional neural networks trained on thousands of known deepfakes can detect these artifacts with high accuracy on studio-quality fakes.
- Physiological signal analysis: Real human faces show subtle photoplethysmographic signals — light absorption patterns that change with heartbeat and blood flow. Deepfake faces typically lack these biologically grounded signals, which detection algorithms can identify.
- Audio-visual synchronisation: Deepfake videos frequently show subtle mismatches between lip movements and phoneme timing. Audio-visual consistency analysis compares the facial muscle movements expected for each sound against actual video evidence.
- Provenance metadata analysis: Real video typically contains consistent metadata (camera model, GPS data, creation timestamp) that is difficult to fabricate convincingly. Missing or inconsistent metadata is a red flag for manipulation.
- Cross-reference with authoritative sources: The most reliable verification approach for news purposes is reverse image/video search and comparison with authenticated original sources. If a video purportedly shows a politician's statement from a specific date, checking the authoritative archive of their statements from that date provides definitive verification.
Limitations of Deepfake Detection
Detection technologies are in an adversarial arms race with generation technologies. As detection methods improve, they are used to train the next generation of synthetic media systems that evade those detections. Current detection systems achieve 90–95% accuracy on benchmark datasets but perform significantly worse on "in-the-wild" deepfakes optimised to evade detection. Detection accuracy also degrades significantly when deepfakes are compressed through social media platforms (which alter artifacts), or when source video quality is low.