Why Vector Search Alone Is Not Enough
Vector search has become the default retrieval method for RAG applications. The premise is elegant: convert documents and queries into high-dimensional embeddings, then find the closest matches in vector space. It captures semantic meaning, so a query about "automobile fuel efficiency" will match documents about "car miles per gallon" even though the words differ entirely.
But vector search has blind spots. It struggles with exact-match requirements — product codes, legal clause numbers, proper nouns, and technical identifiers that must be matched precisely. A vector embedding of "ISO 27001" will be semantically close to other security standards, but a user searching for that specific standard needs exact matches, not approximate ones. Vector search also tends to underperform on short, specific queries where the embedding does not capture enough context to disambiguate meaning.
The Enduring Strength of Keyword Search
Keyword search, built on algorithms like BM25 and TF-IDF, has powered information retrieval for decades. These methods excel at precisely what vector search struggles with: exact matching, handling rare terms, and performing well on short queries. When a user searches for a specific error code or a person's name, keyword search reliably surfaces the right documents because it matches tokens directly rather than relying on learned semantic representations.
The weakness of keyword search is equally well-known. It misses synonyms, cannot handle paraphrasing, and fails when the user's vocabulary differs from the document's vocabulary. A search for "heart attack treatment" will miss documents that only use "myocardial infarction management," even though they address the same topic.


