What Is Vector Search?

Vector search (also called semantic search or embedding-based search) is a retrieval technique that represents documents as numerical vectors in a high-dimensional space, enabling search by meaning rather than keyword matching. Unlike traditional keyword search, which returns documents containing the exact terms queried, vector search finds documents that are conceptually or semantically similar — even when they use entirely different vocabulary.

The technology works by encoding text — whether a news article, a query, or a document excerpt — into a numerical vector using an embedding model (such as OpenAI's text-embedding-3-small, Google's text-embedding-004, or open-source models like bge-large-en). Two texts that discuss the same concept will be encoded as vectors that are close together in the embedding space, enabling similarity search that matches concepts rather than words.

Why Vector Search Transforms Newsroom Archive Access

Every large newsroom sits atop an archive of immense journalistic value — decades of reporting that captures expertise, established sources, investigative precedents, and context for current stories. The problem is that traditional keyword search fails to surface this value effectively. A journalist researching corruption in a specific ministry will not find all relevant prior coverage unless they know the exact terms used in each article. A story about a politician's past involvement in a financial scandal may be buried in an article that used the word "investor" rather than "corrupt" — invisible to keyword search, but immediately surfaced by semantic search.

Vector search enables journalists to query an archive with natural language — "stories about government officials who later faced corruption charges" — and retrieve semantically relevant articles that may not share a single word with the query. This is transformative for investigative journalism, where connecting historical dots is often the key to breaking new stories.

Implementation: pgvector and the Omniscient AI Approach

For production newsroom RAG systems, pgvector — an open-source vector similarity search extension for PostgreSQL — is an increasingly popular choice because it integrates vector search with the standard relational database most newsrooms already operate. pgvector stores document embeddings as a native PostgreSQL data type and supports ANN (Approximate Nearest Neighbour) search with HNSW or IVFFlat indexing, achieving query times of 1–50ms on million-document archives.

Omniscient AI's fact-checking infrastructure uses pgvector to store embeddings of more than 1,200 news and fact-check sources, enabling real-time semantic retrieval of relevant passages for any factual query in under 100ms — fast enough to power the extension's real-time fact-checking interface.