Measured on 122 live claims, April 2026
Live Accuracy Evaluation
The live accuracy figure comes from running Omniscient AI's VERIFAID system against real, labeled claims drawn from the FEVER and LIAR benchmark datasets. Each claim is independently verified, and the system's verdict is compared against the human-annotated ground truth. The result is not self-reported — it is computed directly from the system's outputs.
| Evaluation Run | Dataset | Claims Tested | Correct | Accuracy | F1 Score | Date |
|---|---|---|---|---|---|---|
| No completed evaluation runs on record. | ||||||
Peer-Reviewed Benchmark Datasets
The 1,029,042 benchmark universe figure represents the combined total of labeled claims across seven independent, peer-reviewed academic datasets. These datasets were published at top-tier NLP venues (ACL, EMNLP, NAACL) and are each publicly downloadable and independently verifiable. Omniscient AI's underlying methodology is validated against this full universe.
| Dataset | Claims | Description | Labeled by | Published in |
|---|---|---|---|---|
| FEVER | 185,455 | Fact Extraction and VERification — Thorne et al. NAACL-HLT 2018 | Human annotators via crowdsourcing | Thorne et al. (2018) NAACL-HLT 2018 |
| LIAR (PolitiFact) | 12,836 | LIAR: A Benchmark Dataset for Fake News Detection — Wang, ACL 2017 | PolitiFact journalists | Wang (2017) ACL 2017 |
| MultiFC | 36,534 | Multi-Domain Fact-Checking (MultiFC) — Augenstein et al. EMNLP 2019 | Professional fact-checkers from 26 outlets | Augenstein et al. (2019) EMNLP 2019 |
| VitaminC | 488,002 | VitaminC Benchmark — Schuster et al., ACL 2021 | Human annotators on Wikipedia revision history | Schuster et al. (2021) ACL 2021 |
| FaVIQ | 188,000 | FaVIQ: FAct Verification from Information-seeking Questions | Human annotators from natural questions | Park et al. (2021) EMNLP 2021 |
| FEVEROUS | 87,026 | FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information — Aly et al., EMNLP 2021. Extends FEVER to Wikipedia tables and infoboxes, requiring multi-hop reasoning across text and structured data. | Human annotators; Wikipedia table + text evidence | Aly et al. (2021) EMNLP 2021 |
| X-FACT (25 languages) | 31,189 | X-FACT: A New Benchmark Dataset for Multilingual Fact Checking — Gupta & Srikumar, ACL Findings 2021. Covers 25 languages across 12 international fact-checking outlets. First multilingual real-world fact-checking benchmark. | International fact-checking journalists (25 languages) | Gupta & Srikumar (2021) ACL Findings 2021 |
| Total Benchmark Universe | 1,029,042 | Combined across all 7 peer-reviewed datasets | — | — |
Methodology
VERIFAID (Lopez-Joya et al., 2025) is a role-based prompting framework combined with retrieval-augmented generation. The system retrieves the six most relevant evidence chunks from a live knowledge base of over 1,133 authoritative sources, weighted by trust tier, and uses multiple AI models to reach a consensus verdict.
AI Models Used
- GPT-4o Mini
- Perplexity Sonar Pro
- Google Gemini 2.5 Flash
Ground Truth Sources
- PolitiFact, Snopes, FullFact, FactCheck.org
- Reuters Fact Check, AP Fact Check, BBC
- CDC, WHO, NASA, IPCC, ICJ
- FEVER, LIAR, MultiFC, VitaminC, FaVIQ datasets
Test Claim Sets
- Standard (14 claims) — General health, science, environment
- Hard (10 claims) — Nuanced, time-sensitive, contested topics
- Live News (17 claims) — 2024–2026 current events disinformation
- Extended (250 claims) — LIAR-derived, FEVER-derived, India-specific, Media
Academic Foundation
Based on VERIFAID (Lopez-Joya et al., 2025), Computers & Electrical Engineering, Vol. 128, Art. 110746.
doi.org/10.1016/j.compeleceng.2025.110746
Independent Verification
All seven benchmark datasets cited in this report are publicly downloadable and independently verifiable. The live accuracy figure is computed programmatically from the system's real outputs — it is not manually curated or self-reported. A machine-readable version of this report is also available for programmatic access.