Beyond Keyword Search

Keyword search (TF-IDF, BM25) matches exact words — great when users type the right keywords, terrible when they don't. Semantic search understands meaning: "how to deploy a Next.js app" matches "deploy a React application" even without shared keywords. In 2026, implementing semantic search is practical with open-source tools and embedding APIs. Here's how to actually build it.

The Architecture


┌──────────┐    ┌──────────────┐    ┌───────────────┐    ┌──────────┐
│  Query   │───▶│ Embedding    │───▶│ Vector Search  │───▶│ Results  │
│  String  │    │ Model/API    │    │ (pgvector, etc)│    │ (ranked) │
└──────────┘    └──────────────┘    └───────────────┘    └──────────┘
                      │                      │
                      ▼                      ▼
              float32[] array        cosine similarity
              (1536 dims typical)    or approximate (ANN)

Embedding Model Comparison

ModelDimensionsMax TokensCostMTEB Score (Retrieval)Self-Hostable
OpenAI text-embedding-3-small512/15368,191$0.02/1M tokens62.3 (1536d)No
OpenAI text-embedding-3-large256/1024/30728,191$0.13/1M tokens64.6 (3072d)No
Cohere Embed v41024/20488,192$0.10/1M tokens63.8No
BGE-M3 (BAAI)10248,192Free (self-host)62.0Yes (MIT)
jina-embeddings-v310248,192$0.02/1M tokens62.5No (API only)
gte-Qwen2-7B-instruct358432,768Free (self-host)66.3 (leading)Yes (Apache 2.0)

MTEB = Massive Text Embedding Benchmark. Higher is better. Scores from MTEB leaderboard as of early 2026.

Vector Database Options

DatabaseTypeIndex TypesFilteringBest ForPricing
pgvector (PostgreSQL)Postgres extensionIVFFlat, HNSWFull SQL WHERE + joinsApps already on Postgres, metadata-rich filteringFree (OSS, Postgres license)
QdrantDedicated vector DBHNSW, quantization (binary, scalar, product)Payload filteringHigh performance, advanced quantization, filteringFree (OSS) / Cloud from $25/mo
PineconeManaged vector DBProprietary (serverless)Metadata filteringZero-ops, serverless scaling, no tuning neededFree tier (2GB) → $0.33/GB/mo
WeaviateVector + hybrid DBHNSW, flat, dynamicGraphQL filtering, BM25 + vector hybridHybrid search (keyword + semantic), built-in modulesFree (OSS) / Cloud from $25/mo
MilvusDistributed vector DB12+ index typesScalar filtering, boolean expressionsBillion-scale vectors, distributed, GPU accelerationFree (OSS) / Cloud from $0.55/hr

Implementation Steps

Step 1: Chunk your documents. The quality of your chunks determines the quality of your search. Strategies: fixed-size (simple, 256-512 tokens with overlap), sentence-based (split on sentence boundaries), recursive character splitting (LangChain's default — splits on separators: , , ., space), semantic chunking (use a smaller model to detect topic boundaries — most accurate, more expensive). For most applications, recursive splitting with 512-token chunks and 50-token overlap works well. For code search, split on function/class boundaries.

Step 2: Generate and store embeddings. For a collection of 10,000 documents with 512-token chunks: ~15,000 chunks × 1,536 dimensions × 4 bytes = ~92 MB of vectors. This fits easily in pgvector on a small Postgres instance. Batch embedding generation: 15,000 chunks via OpenAI text-embedding-3-small = ~$0.20. Cache embeddings — don't re-embed unchanged documents.

Step 3: Implement search with reranking. Two-stage retrieval is the standard architecture for production: Stage 1 — Vector search returns top 20-50 candidates (fast, approximate). Stage 2 — Reranker model (Cross-encoder like Cohere Rerank v3 or BGE-Reranker-v2) scores the candidates more precisely and returns top 5-10. Stage 2 adds ~50ms latency but dramatically improves relevance. Without reranking, vector search alone returns "in the ballpark" results; with reranking, you get precisely relevant results.

Hybrid Search: The Best of Both Worlds

Pure semantic search fails for exact match queries (searching for "error code ERR_SSL_PROTOCOL" should match the exact string, not semantically similar concepts). Pure keyword search fails for conceptual queries ("how to deploy" won't match "deployment guide"). Hybrid search combines both: run BM25 + vector search in parallel, merge results via Reciprocal Rank Fusion (RRF). Weaviate and Elasticsearch have hybrid search built in; with pgvector, you implement the combination yourself.

When Semantic Search Is Worth It

Use CaseSemantic Search?Why
Documentation search (user-facing)YesUsers don't know your terminology; they describe problems
Internal knowledge baseYesEmployees search differently; semantic bridges the gap
E-commerce product searchYes (hybrid)"running shoes" should match "trainers" — but exact product codes need keyword
Legal/contract searchNo (or hybrid)Exact terminology matters; "shall" vs "may" is legally significant
Code searchMaybe (hybrid)Function names need exact match; bug descriptions need semantic

Bottom line: For most applications in 2026, the pragmatic choice is pgvector + OpenAI embeddings + a reranker. You already have Postgres, pgvector is a single extension, embeddings cost pennies per thousand documents, and the two-stage retrieval gives production-quality results. If you're doing this at scale (1M+ documents), add Qdrant or Pinecone. See also: Vector Database Comparison and RAG Best Practices.