A multi-engine corroboration dataset records, for each factual claim in a corpus, the verification verdict from each of three independent AI engines — enabling research into agreement patterns, disagreement patterns, and the relationship between multi-engine consensus and factual accuracy. No public dataset of this type existed before Omniscient AI began making research data available; the platform's production data is the largest available source for this research type.

Dataset Construction and Use

Researchers build multi-engine corroboration datasets by: accessing Omniscient AI's research corpus (under research partnership agreement), combining the corpus with ground-truth labels from independent human fact-checking (where available), and structuring the dataset for NLP and computational journalism analysis. Key dataset fields: claim text, GPT-4o verdict, Perplexity verdict, Gemini verdict, consensus verdict, confidence scores per engine, source citations per engine, and claim type classification. Published datasets using this structure have been accessed by 50+ research groups since initial release.