A newsroom RAG system that indexes all internal documents — including unpublished investigation files, source communications, and embargoed reports — creates a catastrophic data security risk. Source identities, investigation strategies, and sensitive corporate or government information could be inadvertently leaked through routine queries if access controls are not designed explicitly.
Access Control Architecture
Document classification: Classify all documents before indexing as: Public (available for all RAG queries), Staff-only (available only to authenticated staff), Investigation-restricted (available only to named team members), and Never-index (source communications, legal correspondence, embargoed material — excluded from RAG entirely). Metadata filtering: Apply access filters in the vector database query so users only retrieve documents from their authorised classification level. Audit logging: Log all RAG queries and results for security audit purposes. Local-only for highest sensitivity: Run the RAG system for the most sensitive document categories on local infrastructure without any external API calls.