
š¬ Episode 2 : š Building on RAG: Exploring BM25 and Semantic Search

Ai engineer
Retrieval-Augmented Generation (RAG) depends on effective search. BM25 offers fast, keyword-based precision, while Semantic Search uses embeddings to capture meaning and context. Hybrid Search blends both approaches,combining exact matches with semantic understanding.to deliver more accurate and reliable retrieval for AI systems.
š¬ Episode 2 ā š Building on RAG: Exploring BM25 and Semantic Search
Welcome back to our RAG series! In Episode1 [Link] we introduced the concept of Retrieval-Augmented Generation (RAG) and discussed its significance in enhancing AI responses by retrieving relevant information from external sources.In Episode2, we delve deeper into the mechanics of RAG by exploring two pivotal components of information retrieval: BM25 and Semantic Search. Understanding these methods is crucial for refining the retrieval process within RAG systems.
Building upon that foundation, this post shifts focus to two pivotal components of information retrieval:Ā BM25Ā andĀ Semantic Search. Understanding these methods is crucial for refining the retrieval process within RAG systems.
-
BM25: A traditional, keyword-based ranking function that evaluates the relevance of documents based on the frequency of query terms. It's known for its efficiency and effectiveness in scenarios where exact term matching is paramount.
-
Semantic Search: A more advanced approach that leverages embeddings and vector databases to capture the meaning behind words, enabling the retrieval of documents that are contextually relevant, even if they don't contain the exact query terms.
In the following sections, we'll explore each of these methods in detail, examining their mechanisms, strengths, and how they can be integrated into RAG systems to enhance performance and reliability.
š BM25 (Best Match 25): the workhorse of lexical retrieval
BM25 is a ranking function used by search engines (e.g., Lucene/Elasticsearch) to score how well each document matches a query usingĀ words, not meanings. It balances three ideas: (1) rare terms are more informative, (2) repeating a term helps but withĀ diminishing returns, and (3) longer documents shouldnāt win just because theyāre long.
How BM25 scores a match (intuition, not scary math)
For each query term, BM25 computes a score and sums them:
- IDF (inverse document frequency):Ā rarer terms getĀ bigger weight; very common terms contribute little.
- Term frequency saturation:Ā the more a term appears in a document, the higher the scoreābut with a saturation controlled byĀ k1Ā (typically between ~1ā2).
- Length normalization:Ā longer-than-average docs areĀ penalizedĀ via parameterĀ bĀ (0ā1; often aroundĀ 0.75).
Tuning tips:
- IncreaseĀ k1Ā ā give more credit to repeated terms.
- IncreaseĀ bĀ ā normalize more aggressively against long documents.
When BM25 shines
- Exact phrases, IDs, codes, product names, error messages, legal citations.
- You wantĀ fast, dependableĀ retrieval without embeddings.
- As theĀ sparseĀ half ofĀ hybrid retrievalĀ (paired with a vector retriever in RAG).
ā” Minimal BM25 in Python withĀ bm25s
bm25s
Ā is a modern, fast, low-dependency BM25 library that benchmarks near Elasticsearch speed on a single nodeāwhile staying pure Python. It supports popular BM25 variants (Lucene/ATIRE/BM25+/BM25L) and integrates with Hugging Face for saving/loading indices.
# pip install bm25s
import bm25s
# A tiny corpus
corpus = [
"A cat is a small domesticated carnivorous mammal.",
"Dogs are loyal companions that love to play.",
"Birds can fly long distances and sing beautifully.",
"Fish live in water and do not purr.",
]
# 1) Build the index (tokenize + index)
tokens = bm25s.tokenize(corpus) # basic tokenizer; you can customize
retriever = bm25s.BM25(corpus=corpus, method="lucene") # choose a variant
retriever.index(tokens)
# 2) Query
query = "Do fish purr like a cat?"
q_tokens = bm25s.tokenize(query)
docs, scores = retriever.retrieve(q_tokens, k=3) # top-3 results
for rank, (doc, score) in enumerate(zip(docs[0], scores[0]), start=1):
print(f"{rank}. score={score:.2f} | {doc}")
What youāll observe:
- The ācatā sentence ranks high because ofĀ term matchesĀ (ācatā, āpurrā).
- The āfishā sentence also ranks (it negates purring), helped by exact matches (āfishā).
- No embeddings are usedājust words and BM25 scoring. ([Hugging Face][2])
Notes: You can switch variants withĀ
method="bm25+"
Ā /Ā"bm25l"
Ā /Ā"atire"
Ā and (in BM25-style systems) tuneĀ k1Ā andĀ bĀ to control saturation and length normalization.
š§© Using BM25 inside LlamaIndex
LlamaIndex exposes aĀ BM25Retriever
Ā (powered by BM25S under the hood) with convenient persistence, metadata filtering, and hybrid-combo examples.
# pip install llama-index llama-index-retrievers-bm25 Stemmer
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
from llama_index.retrievers.bm25 import BM25Retriever
import Stemmer
# 1) Tiny document set
docs = [
Document(text="BM25 ranks documents using word matches and IDF."),
Document(text="Vector search uses embeddings to capture meaning."),
Document(text="Elasticsearch uses BM25 for lexical scoring by default."),
]
# 2) Chunk into nodes
splitter = SentenceSplitter(chunk_size=256)
nodes = splitter.get_nodes_from_documents(docs)
# 3) Create a BM25 retriever (with stemming & stopwords for English)
bm25 = BM25Retriever.from_defaults(
nodes=nodes,
similarity_top_k=3,
stemmer=Stemmer.Stemmer("english"),
language="english",
)
# 4) Retrieve
results = bm25.retrieve("How does Elasticsearch score text?")
for i, node in enumerate(results, 1):
print(f"{i}. score={node.score:.3f} | {node.text}")
This is production-friendly: you canĀ persistĀ the index to disk, use aĀ docstore, applyĀ metadata filters, or combine it with a vector store forĀ hybrid retrieval.
Takeaways
- BM25 isĀ fast, proven, and robustĀ for keyword-heavy queries.
- Two knobs matter most:Ā k1Ā (term saturation) andĀ bĀ (length normalization).
- Libraries likeĀ bm25sĀ make BM25 easy and fast in Python; frameworks likeĀ LlamaIndexĀ let you plug it into a full RAG stack (and fuse with dense retrieval later).
š§ Semantic Search: When Meaning Matters More Than Words
Semantic searchĀ is all about finding what youĀ mean, not just what youĀ say. It uses embeddingsācompressed representations of languageāto surface results based onĀ context, intent, and similarity in meaning. This is what powers smarter AI systems and RAG pipelines.
What Is Semantic Search?
Semantic search interprets the meaning behind queries. Unlike keyword search that matches based on literal words, semantic search relies onĀ vector similarity, so it can connect "chocolate milk" with "milk chocolate" differently (because the meaning changes with word order) and understand that "football" might mean āsoccerā or āAmerican footballā depending on context.
How Does It Work?
-
Embedding GenerationĀ Both your documents and search queries are transformed intoĀ embeddingsāhigh-dimensional numeric vectors capturing meaning.
-
Vector Storage in DatabasesĀ These embeddings are stored in specializedĀ vector databases, optimized to find nearest vectors efficientlyāeven in large datasets.
-
Similarity Search (k-NN)Ā When a query comes in, it's embedded and compared against stored vectors using similarity metrics (like cosine similarity). The system returns the closest matchesāthose conceptually most similar to your query.
-
Efficient Indexing TechniquesĀ To speed up searches over massive embedding sets, vector databases use approximate nearest neighbor strategies likeĀ HNSW (Hierarchical Navigable Small World graphs).
Why It Matters
-
Understands Context & SynonymsĀ It can connect semantically related conceptsāeven if they donāt share words. Search for "electric cars" ā āEV,ā āTesla,ā or āhybrid vehicle.ā
-
Delivers Intent-Rich ResultsĀ Semantic search interprets deeper meaning, delivering results based onĀ intentānot just matching keywords.
-
Handles Natural Language Queries BetterĀ You can type "stylish couch for small living room," and it understands the queryās meaningānot just the keywords.
-
Crucial for RAGĀ Embeddings and vector search enable RAG systems to fetch documents with relevantĀ meaning, not just literal overlap. This reduces hallucinations and enhances grounding.
Quick Python Example: Semantic Search with Embeddings + FAISS
# pip install sentence-transformers faiss-cpu
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
# 1. Sample corpus and embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
docs = [
"Electric cars are the future of transportation.",
"Healthy smoothie recipes for breakfast.",
"Python programming tips and tricks.",
"Tesla unveils new self-driving car model.",
]
# 2. Compute embeddings
doc_embeddings = model.encode(docs, convert_to_numpy=True)
# 3. Build FAISS index (vector DB)
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(doc_embeddings)
# 4. Query and retrieve
query = "latest developments in autonomous electric vehicles"
query_vector = model.encode([query], convert_to_numpy=True)
distances, indices = index.search(query_vector, k=2)
for dist, idx in zip(distances[0], indices[0]):
print(f"Score: {dist:.4f} | {docs[idx]}")
What to expect:
- The query about autonomous electric vehicles should bring up the Tesla/self-driving car docāeven though the wording doesnāt exactly match.
Summary: Semantic Search in a Nutshell
Component | Function |
---|---|
Embeddings | Represent meaning of text numerically |
Vector Database | Efficient storage & retrieval of embeddings |
Similarity Search | Finds conceptually closest matches (k-NN) |
Indexing (e.g., HNSW) | Speeds up search in high-dimensional spaces |
Semantic search bridges the gap between human intent and machine understandingāmaking RAG systems more accurate and intuitive.
š§© Hybrid Search: Merging Precision with Context
Hybrid searchĀ is a technique that combines keyword (lexical) search with semantic (vector) search to leverage the strengths of both approaches. This fusion enhances search relevance by considering both exact term matching and semantic context, providing more accurate and comprehensive search results .
š What Is Hybrid Search?
Hybrid search operates by running both keyword and vector searches in parallel and then merging their results. The combination can be achieved through various fusion methods, such as:
- Ranked Fusion: Merges results based on their ranks from both search types.
- Relative Score Fusion: Combines results by normalizing and merging their scores .
This approach allows systems to benefit from the exactness of keyword matches and the contextual understanding of semantic searches, making it versatile and robust for a wide range of search use cases .
āļø How Does Hybrid Search Work?
-
Keyword Search: Utilizes traditional keyword-based methods like BM25 to find documents that match the query terms exactly.
-
Semantic Search: Employs vector embeddings to understand the meaning behind the query and retrieve documents with similar semantic content.
-
Fusion: Combines the results from both searches using a fusion algorithm to produce a final ranking of documents.
This process ensures that the search system considers both the literal terms in the query and the underlying meaning, leading to more relevant and accurate results.
š§Ŗ Example: Implementing Hybrid Search in OpenSearch
OpenSearch 2.11 introduced hybrid search capabilities, allowing users to combine keyword and semantic searches effectively. Here's an example of how to implement a hybrid search query:
POST /my-index/_search
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"title": "AI advancements"
}
},
{
"vector": {
"field": "content_vector",
"query_vector": [0.12, 0.34, 0.56, ...],
"k": 5
}
}
],
"fusion": {
"method": "rankedFusion",
"parameters": {
"keyword_weight": 0.5,
"vector_weight": 0.5
}
}
}
}
}
In this example:
- TheĀ
match
Ā query searches for documents where the title contains "AI advancements". - TheĀ
vector
Ā query searches for documents with content vectors similar to the provided query vector. - TheĀ
fusion
Ā method combines the results from both queries using theĀrankedFusion
Ā algorithm, with equal weights assigned to both keyword and vector searches. This setup allows OpenSearch to return documents that are both semantically relevant and contain the exact keywords, enhancing the overall search experience.
š” Why Use Hybrid Search?
- Enhanced Relevance: Combines the precision of keyword search with the contextual understanding of semantic search.
- Flexibility: Allows fine-tuning of the contribution of each search type through fusion parameters.
- Improved User Experience: Delivers more accurate and comprehensive search results, meeting diverse user needs.
š The Evolution of Search: From Precision to Understanding
In the realm of Retrieval-Augmented Generation (RAG), the effectiveness of AI systems hinges on their ability to retrieve and process information accurately. The journey of search methodologies reflects this progression:
-
Keyword Search: The bedrock of traditional search engines, keyword search offers precision by matching exact terms. It's akin to a meticulous librarian who retrieves information based solely on specific queries.
-
Semantic Search: Advancing beyond mere keyword matching, semantic search understands the context and meaning behind queries. This approach is like conversing with an expert who comprehends the nuances of your questions.
-
Hybrid Search: Integrating the strengths of both keyword and semantic search, hybrid search combines precision with contextual understanding. It's as if you have a team of specialistsāeach excelling in different areasācollaborating to provide the most accurate and relevant information.
š” Stay tuned! In the next post, weāll dive deeper into RAG pipelines and discuss how to evaluate chunk retrievalāshowing how BM25, semantic search, and hybrid methods work together to make AI systems smarter and more reliable.