Vector Search Optimization Techniques
Vector Search Fundamentals
Vector search finds similar items using embedding vectors. It powers semantic search, recommendation systems, and RAG applications.
HNSW Index Parameters
Hierarchical Navigable Small World (HNSW) is the most popular vector index:
-- pgvector HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
SET hnsw.ef_search = 100;
SELECT id, embedding <=> '[0.1, 0.2, ...]' as distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]' LIMIT 10;
Key HNSW parameters: M (connections per node, default 16), ef_construction (build quality, default 200), ef_search (query recall, set per-query).
Quantization
Reduce memory footprint:
def quantize_int8(vectors):
mins = vectors.min(axis=0)
maxs = vectors.max(axis=0)
scale = 255.0 / (maxs - mins + 1e-8)
quantized = ((vectors - mins) * scale - 128).astype(np.int8)
return quantized, mins, scale
FP16 halves memory. INT8 reduces by 4x. Product quantization achieves 10-30x compression.
Pre-filtering Strategies
-- Post-filtering: search then filter
SELECT id, embedding <=> '[0.1, 0.2]' as distance
FROM documents
WHERE category = 'technology'
ORDER BY embedding <=> '[0.1, 0.2]' LIMIT 10;
Prefer pre-filtering with separate indexes per filter category for optimal performance.
Conclusion
Tune HNSW parameters for your data distribution. Use quantization to reduce memory. Benchmark with your actual data. Monitor recall in production.