Vector Database Tuning: Index Parameters, Search Configuration, and Hybrid Search


Vector databases are the backbone of modern RAG applications. But default configurations rarely give optimal results. Tuning your vector database for your specific data distribution and query patterns can improve recall by 20% or more while reducing latency. Here is the tuning guide.





Understanding Index Types





Vector databases support multiple index types, each with different trade-offs between search speed, memory usage, and recall accuracy.





HNSW Hierarchical Navigable Small World is the default choice for most applications. It offers excellent search speed and recall. The trade-off is higher memory usage and slower index building. HNSW is ideal when your dataset fits in memory and query speed is critical.





IVF Inverted File Index is more memory-efficient than HNSW. It partitions vectors into clusters and searches only the nearest clusters during query. IVF requires more tuning to balance speed and accuracy but uses significantly less memory.





There is no universal best index. HNSW is a safe starting point for most applications. Switch to IVF if memory is constrained. Test both on your data with your query patterns before committing.





HNSW Parameter Tuning





HNSW has two critical parameters: M and ef_construction. M controls the number of connections per node. Higher M values improve recall at the cost of memory. A value of 16 is a good default. For high-recall requirements on smaller datasets, use 32. For large datasets where memory is constrained, use 8.





Ef_construction controls the dynamic candidate list size during index construction. Higher values produce better recall at the cost of slower index building. Start with 200 and adjust based on your build time tolerance.





The search-time parameter ef controls the candidate list size during query. Higher ef values improve recall but increase latency. Start with 50 and tune based on latency budgets. For every 2x increase in ef, recall improves by roughly 1-3% while latency increases linearly.





Distance Metrics





The choice of distance metric affects both retrieval quality and performance. Cosine similarity is the default for text embeddings. It measures the angle between vectors and is appropriate for normalized embeddings.





L2 Euclidean distance measures straight-line distance in vector space. For normalized embeddings, cosine similarity and L2 produce equivalent rankings because normalization maps Euclidean distances to cosine similarity. For unnormalized embeddings, the choice matters.





Inner product works well for embeddings trained with inner product loss functions. Some embedding models like text-embedding-ada-002 perform better with cosine similarity but other models are optimized for inner product.





Use the distance metric that matches your embedding model's training objective. Check the documentation for your embedding model. Using the wrong metric silently degrades retrieval quality by 5-15%.





Search Configuration





Beyond index type and parameters, search configuration affects quality. The number of results returned top_k is the most important setting.





Optimal top_k depends on your RAG pipeline. If the LLM can handle large context windows, return more results 10 to 20 and let the model select relevant information. If the LLM has limited context, return fewer results 3 to 5.





Consider rescoring after initial retrieval. Run a fast initial search with a small ef to get 50 candidates, then rescore those candidates with a more accurate method or a different model. This two-stage approach balances speed and quality.





Filters and pre-filters affect search behavior. Pre-filtering before vector search reduces the search space but can degrade recall if the filter eliminates relevant results. Post-filtering provides better recall but may return too few results after filtering.





Hybrid Search





Hybrid search combines vector similarity with keyword matching. It catches exact matches that vector search might miss and handles queries where vector similarity alone performs poorly.





Implement hybrid search with weighted scoring: assign a weight to vector similarity score and a weight to keyword score, then combine them. Start with 70% vector, 30% keyword and adjust based on your query analysis.





BM25 is the standard keyword scoring algorithm. It works well for exact term matching and complements vector search. Most vector databases support BM25 in some form.





Analyze your query patterns to tune the hybrid weight. Queries with specific terminology benefit more from keyword weight. Conversational queries benefit more from vector weight. If possible, tune the weight per query type.





Monitoring and Maintenance





Vector database performance degrades over time as data is added and updated. Monitor recall, latency, and memory usage monthly.





Rebuild indexes periodically. Index fragmentation reduces performance over time. Schedule monthly or quarterly index rebuilds during low-traffic periods.





Vacuum deleted vectors. Deleted vectors in IVF and HNSW indexes are marked but not removed. Over time, these phantom entries degrade performance. Regular cleanup restores index efficiency.





Test index changes on a production copy before deploying. A parameter change that improves recall by 5% but doubles latency is not an improvement if latency was already marginal.





Vector database tuning is iterative. Start with defaults, benchmark your recall and latency, adjust one parameter at a time, and measure the impact. The optimal configuration depends on your data, your queries, and your performance requirements.