What are Embeddings
Embeddings are numerical vector representations of data that encode semantic meaning in a continuous, high-dimensional space. Each embedding is a fixed-length vector of real numbers produced by an embedding model. Embeddings preserve semantic relationships geometrically: inputs with similar meaning are represented by vectors that are closer together in the embedding space, while unrelated inputs are farther apart. Embeddings are a form of representation, not a retrieval or generation mechanism by themselves. They provide a standardized way to transform unstructured content into a format that can be indexed and compared for semantic retrieval.How Embeddings Work
Step 1: Content Embedding
Source content such as documents, webpages, and internal knowledge is prepared for embedding. Large documents are typically split into smaller chunks so that each chunk represents a coherent unit of meaning. Each chunk is then passed through an embedding model to produce a vector representation. The resulting embeddings collectively represent the semantic structure of the content corpus.Step 2: Build a Vector Index
The generated embeddings are stored into a vector index for similarity search. Indexing structures (such as HNSW and IVF) are designed to organize vector representations so that the most similar vectors can be located quickly, enabling scalable semantic retrieval without scanning all vectors.Step 3: Query Embedding and Retrieval
When a user submits a query, the query text is converted into a vector representation using the same embedding model applied to the indexed content. This ensures that the query and the content are represented in a shared vector space. The query vector is compared against the indexed vectors using a similarity metric, such as cosine similarity, Euclidean distance, or dot product, to retrieve the top-K most relevant results. The retrieved results represent the semantic retrieval output and provide relevant context for re-ranking or answer generation.Comparison with Other Retrieval Approaches
Embedding-based retrieval represents and matches content by semantic meaning, and commonly serves as the semantic recall layer in modern search systems. The following table compares it with other widely used retrieval approaches across key dimensions.| Retrieval Approach | Representation | Retrieval Mechanism | Strengths | Limitations |
|---|---|---|---|---|
| Keyword Retrieval | Tokens | Matches query terms using inverted indexes and statistical relevance | - Stable, interpretable, strong exact matching | - Limited semantic understanding |
| Embedding Retrieval | Embeddings | Computes semantic similarity between query and content vectors | - Strong semantic recall - Supports natural language and multilingual queries | - Depends on embedding model quality - Less effective for exact matching and filters |
| Hybrid Retrieval | Tokens + Embeddings | Combines keyword matching and semantic similarity during retrieval | - Balances precision and semantic recall | - Higher system complexity |
| Generative-only Answering | Model-internal representations | Produces answers directly without retrieving external content | - Natural responses - Simple interaction | - Prone to hallucinations - No access to real-time information - No traceable sources |
Model Choices
Octen offers a family of embedding models in three sizes, allowing you to choose the optimal balance between performance and cost.| Model | Context Length (tokens) | Embedding Dimension | Description |
|---|---|---|---|
| octen-embedding-8b | 32,768 | 4096 | State of the art (SOTA) embedding model with the best accuracy in the world. Best for use cases where retrieval quality is critical. |
| octen-embedding-4b | 32,768 | 2560 | Balanced performance and cost. Recommended for most production workloads. |
| octen-embedding-0.6b | 32,768 | 1024 | Lightweight and cost-efficient. Ideal for cost-sensitive or high-volume applications. |