Retrieval quality degrades over time when a RAG corpus is not maintained. Outdated articles that have since been corrected will return wrong information; duplicate content creates retrieval noise; low-quality legacy articles dilute precision. A quarterly corpus maintenance process keeps retrieval quality high.

The Quarterly Corpus Maintenance Checklist

1. Remove superseded content: Articles that have been updated with corrections should be replaced with the corrected version. 2. Update temporal metadata: Ensure all documents have accurate publication and last-modified dates — retrieval systems weight recency. 3. Deduplicate: Identify and remove near-duplicate content (the same press release published by multiple outlets; wire stories that were later replaced with original reporting). 4. Prune low-quality sources: Remove documents from sources that have since lost credibility or shut down. 5. Add new high-quality sources: Review your beat for authoritative new sources published since the last maintenance cycle. 6. Test retrieval quality: Run 20–30 benchmark queries and evaluate whether retrieved results are relevant and accurate.