Introduction
Embeddings are the foundation of modern AI applications — powering semantic search, recommendation systems, clustering, and classification. At their core, embeddings convert text, images, or other data into dense vector representations that capture semantic meaning. This guide explains what embeddings are, how they work, and how to use them effectively.
What Are Embeddings?
An embedding is a numerical representation of data — typically a list of floating-point numbers — where the position in vector space encodes semantic meaning. Similar concepts cluster together, while dissimilar concepts are far apart.
For example, the embeddings for "king" and "queen" would be closer to each other than to "bicycle." More importantly, embeddings capture analogical relationships: the vector operation `king - man + woman` produces a vector close to `queen`.
Modern embedding models produce vectors with 384 to 3072 dimensions. The trade-off is speed versus information density: smaller vectors are faster to compare but capture less nuance.
How Embeddings Are Generated
Embedding models are trained using contrastive learning. The model learns to pull semantically similar texts together in vector space while pushing dissimilar texts apart.
**The training signal is typically:**
The most common loss function is **contrastive loss** or **InfoNCE loss**, which uses in-batch negative sampling to provide training signal at scale.
Popular Embedding Models
| Model | Dimensions | Best For | Size |
|-------|-----------|----------|------|
| text-embedding-3-small | 512-1536 | General purpose | API |
| text-embedding-3-large | 256-3072 | High accuracy | API |
| BAAI/bge-large-en-v1.5 | 1024 | RAG and search | 1.3 GB |
| sentence-transformers/all-MiniLM-L6-v2 | 384 | Speed-critical apps | 80 MB |
| intfloat/e5-mistral-7b-instruct | 4096 | Highest quality | 14 GB |
For production RAG systems, `bge-large-en-v1.5` offers an excellent balance of quality and speed. For mobile or latency-sensitive applications, `all-MiniLM-L6-v2` is the standard choice.
Using Embeddings in Practice
Generating Embeddings
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-large-en-v1.5")
documents = [
"Embeddings represent text as dense vectors.",
"Semantic search uses vector similarity to find relevant results.",
"Vector databases store and index embeddings for fast retrieval."
]
embeddings = model.encode(documents, normalize_embeddings=True)
print(f"Shape: {embeddings.shape}") # (3, 1024)
Measuring Similarity
Cosine similarity is the standard metric for comparing embeddings:
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) # For normalized vectors, dot = cosine
query_embedding = model.encode("How do vectors represent meaning?", normalize_embeddings=True)
scores = [cosine_similarity(query_embedding, doc_emb) for doc_emb in embeddings]
When using normalized embeddings, dot product and cosine similarity are identical, simplifying computation.
Dimensionality Reduction
For visualization, reduce embeddings to 2D or 3D using UMAP or t-SNE:
import umap
reducer = umap.UMAP(n_components=2, random_state=42)
embeddings_2d = reducer.fit_transform(embeddings)
UMAP preserves more global structure than t-SNE and is significantly faster for large datasets.
Advanced Techniques
Multi-Task Embeddings
Some models can produce different embeddings for different tasks by prepending task-specific prefixes. For example, `bge` models use:
This simple technique improves retrieval accuracy by 2-5% in production systems.
Matryoshka Embeddings
Matryoshka embedding models (like OpenAI's text-embedding-3 series) produce vectors where the first N dimensions form a valid, lower-quality embedding. You can truncate the vector at inference time to trade accuracy for speed and storage — using 256 dimensions for initial retrieval and 1536 for re-ranking.
Common Pitfalls
2. **Using raw cosine similarity with unnormalized vectors**: Results are dominated by vector magnitude
3. **Ignoring domain mismatch**: Embedding models trained on general web text may perform poorly on medical or legal text
4. **Fixed chunk sizes in RAG**: Large chunks dilute meaning; small chunks lose context
Conclusion
Embeddings are a versatile and powerful tool in the AI practitioner's toolkit. They enable semantic search, document clustering, recommendation systems, and are the backbone of RAG architectures. Choosing the right model, normalizing properly, and understanding the trade-offs between dimension count and performance will determine the success of your embedding-based application.