AI Embeddings Explained

Introduction

Embeddings are the foundation of modern AI applications — powering semantic search, recommendation systems, clustering, and classification. At their core, embeddings convert text, images, or other data into dense vector representations that capture semantic meaning. This guide explains what embeddings are, how they work, and how to use them effectively.

What Are Embeddings?

An embedding is a numerical representation of data — typically a list of floating-point numbers — where the position in vector space encodes semantic meaning. Similar concepts cluster together, while dissimilar concepts are far apart.

For example, the embeddings for "king" and "queen" would be closer to each other than to "bicycle." More importantly, embeddings capture analogical relationships: the vector operation `king - man + woman` produces a vector close to `queen`.

Modern embedding models produce vectors with 384 to 3072 dimensions. The trade-off is speed versus information density: smaller vectors are faster to compare but capture less nuance.

How Embeddings Are Generated

Embedding models are trained using contrastive learning. The model learns to pull semantically similar texts together in vector space while pushing dissimilar texts apart.

**The training signal is typically:**

**If two texts are similar** (paraphrases, translations, query-document pairs): minimize the distance between their embeddings

**If two texts are dissimilar**: maximize the distance

The most common loss function is **contrastive loss** or **InfoNCE loss**, which uses in-batch negative sampling to provide training signal at scale.

Popular Embedding Models

|-------|-----------|----------|------|

| text-embedding-3-small | 512-1536 | General purpose | API |

| text-embedding-3-large | 256-3072 | High accuracy | API |

| BAAI/bge-large-en-v1.5 | 1024 | RAG and search | 1.3 GB |

| sentence-transformers/all-MiniLM-L6-v2 | 384 | Speed-critical apps | 80 MB |

| intfloat/e5-mistral-7b-instruct | 4096 | Highest quality | 14 GB |

For production RAG systems, `bge-large-en-v1.5` offers an excellent balance of quality and speed. For mobile or latency-sensitive applications, `all-MiniLM-L6-v2` is the standard choice.

Using Embeddings in Practice

Generating Embeddings


from sentence_transformers import SentenceTransformer



model = SentenceTransformer("BAAI/bge-large-en-v1.5")

documents = [

    "Embeddings represent text as dense vectors.",

    "Semantic search uses vector similarity to find relevant results.",

    "Vector databases store and index embeddings for fast retrieval."

]



embeddings = model.encode(documents, normalize_embeddings=True)

print(f"Shape: {embeddings.shape}")  # (3, 1024)

Measuring Similarity

Cosine similarity is the standard metric for comparing embeddings:


import numpy as np



def cosine_similarity(a, b):

    return np.dot(a, b)  # For normalized vectors, dot = cosine



query_embedding = model.encode("How do vectors represent meaning?", normalize_embeddings=True)

scores = [cosine_similarity(query_embedding, doc_emb) for doc_emb in embeddings]

When using normalized embeddings, dot product and cosine similarity are identical, simplifying computation.

Dimensionality Reduction

For visualization, reduce embeddings to 2D or 3D using UMAP or t-SNE:


import umap



reducer = umap.UMAP(n_components=2, random_state=42)

embeddings_2d = reducer.fit_transform(embeddings)

UMAP preserves more global structure than t-SNE and is significantly faster for large datasets.

Advanced Techniques

Multi-Task Embeddings

Some models can produce different embeddings for different tasks by prepending task-specific prefixes. For example, `bge` models use:

`"Represent this sentence for searching relevant passages: "` for query encoding

No prefix for document encoding

This simple technique improves retrieval accuracy by 2-5% in production systems.

Matryoshka Embeddings

Matryoshka embedding models (like OpenAI's text-embedding-3 series) produce vectors where the first N dimensions form a valid, lower-quality embedding. You can truncate the vector at inference time to trade accuracy for speed and storage — using 256 dimensions for initial retrieval and 1536 for re-ranking.

Common Pitfalls

**Not normalizing embeddings**: Always normalize unless you have a specific reason not to

2. **Using raw cosine similarity with unnormalized vectors**: Results are dominated by vector magnitude

3. **Ignoring domain mismatch**: Embedding models trained on general web text may perform poorly on medical or legal text

4. **Fixed chunk sizes in RAG**: Large chunks dilute meaning; small chunks lose context

Conclusion

Embeddings are a versatile and powerful tool in the AI practitioner's toolkit. They enable semantic search, document clustering, recommendation systems, and are the backbone of RAG architectures. Choosing the right model, normalizing properly, and understanding the trade-offs between dimension count and performance will determine the success of your embedding-based application.