Skip to main content

What are Embeddings

Embeddings are numerical vector representations of data that encode semantic meaning in a continuous, high-dimensional space. Each embedding is a fixed-length vector of real numbers produced by an embedding model. Embeddings preserve semantic relationships geometrically: inputs with similar meaning are represented by vectors that are closer together in the embedding space, while unrelated inputs are farther apart. Embeddings are a form of representation, not a retrieval or generation mechanism by themselves. They provide a standardized way to transform unstructured content into a format that can be indexed and compared for semantic retrieval.

How Embeddings Work

Step 1: Content Embedding

Source content such as documents, webpages, and internal knowledge is prepared for embedding. Large documents are typically split into smaller chunks so that each chunk represents a coherent unit of meaning. Each chunk is then passed through an embedding model to produce a vector representation. The resulting embeddings collectively represent the semantic structure of the content corpus.

Step 2: Build a Vector Index

The generated embeddings are stored into a vector index for similarity search. Indexing structures (such as HNSW and IVF) are designed to organize vector representations so that the most similar vectors can be located quickly, enabling scalable semantic retrieval without scanning all vectors.

Step 3: Query Embedding and Retrieval

When a user submits a query, the query text is converted into a vector representation using the same embedding model applied to the indexed content. This ensures that the query and the content are represented in a shared vector space. The query vector is compared against the indexed vectors using a similarity metric, such as cosine similarity, Euclidean distance, or dot product, to retrieve the top-K most relevant results. The retrieved results represent the semantic retrieval output and provide relevant context for re-ranking or answer generation.

Comparison with Other Retrieval Approaches

Embedding-based retrieval represents and matches content by semantic meaning, and commonly serves as the semantic recall layer in modern search systems. The following table compares it with other widely used retrieval approaches across key dimensions.
Retrieval ApproachRepresentationRetrieval MechanismStrengthsLimitations
Keyword RetrievalTokensMatches query terms using inverted indexes and statistical relevance- Stable, interpretable, strong exact matching- Limited semantic understanding
Embedding RetrievalEmbeddingsComputes semantic similarity between query and content vectors- Strong semantic recall - Supports natural language and multilingual queries- Depends on embedding model quality - Less effective for exact matching and filters
Hybrid RetrievalTokens + EmbeddingsCombines keyword matching and semantic similarity during retrieval- Balances precision and semantic recall- Higher system complexity
Generative-only AnsweringModel-internal representationsProduces answers directly without retrieving external content- Natural responses - Simple interaction- Prone to hallucinations - No access to real-time information - No traceable sources

Model Choices

Octen offers a family of embedding models in three sizes, allowing you to choose the optimal balance between performance and cost.
ModelContext Length (tokens)Embedding DimensionDescription
octen-embedding-8b32,7684096State of the art (SOTA) embedding model with the best accuracy in the world. Best for use cases where retrieval quality is critical.
octen-embedding-4b32,7682560Balanced performance and cost. Recommended for most production workloads.
octen-embedding-0.6b32,7681024Lightweight and cost-efficient. Ideal for cost-sensitive or high-volume applications.

Common Use Cases

Embeddings enable systems to understand and retrieve information based on intent and meaning rather than exact keyword matches. This allows users to express their needs in natural language, since they don’t always remember exact keywords or terminology. Semantic search is widely applied across content platforms, internal knowledge systems, and applications where users search for information, products, or answers using free-form queries. By operating in a shared semantic space, embeddings enable consistent search experience across varied phrasing, domains, and languages. This improves search relevance and usability for users, while reducing complexity for building and maintaining search across large, diverse datasets.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an approach that combines information retrieval with language model generation, using external content to inform responses. In RAG systems, embedding-based retrieval leads to more accurate and trustworthy responses, while enabling proprietary or frequently updated knowledge into generative applications without training LLMs. This makes RAG well suited for applications where up-to-date, domain-specific or organizational knowledge must be incorporated, such as internal knowledge assistants, compliance and regulatory tools, and multi-step reasoning workflows. Compared to semantic search, embeddings in RAG serve not only to retrieve relevant content, but also to guide and constrain the model’s generated output.

Other Use Cases

Clustering and Similarity: Grouping content with similar semantic representations Recommendations: Identifying related items based on embedding proximity