Skip to main content
POST
/
vl-embedding
curl --request POST \
  --url https://api.octen.ai/vl-embedding \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "model": "octen-vl-embedding",
  "input": {
    "contents": [
      {
        "text": "What is multimodal vector search?"
      }
    ]
  }
}
'
{
  "code": 0,
  "msg": "success",
  "request_id": "a7b8c9d0-e1f2-3456-abcd-789012345678",
  "data": {
    "results": [
      {
        "index": 0,
        "embedding": [
          0.0156,
          -0.0298,
          0.0411
        ],
        "type": "fusion"
      }
    ],
    "model": "octen-vl-embedding-large"
  },
  "meta": {
    "usage": {
      "input_tokens": 6814,
      "text_tokens": 18,
      "image_tokens": 6796,
      "image_count": 2,
      "duration": 22
    },
    "warning": null
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.octen.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

x-api-key
string
header
required

API key used for request authentication. Obtain an API key before using the API. Note: A payment method is required to use the API.

Body

application/json
model
enum<string>
required

The multimodal embedding model used for this request.

Available options:
octen-vl-embedding,
octen-vl-embedding-large
input
object
required

The multimodal content to be vectorized. Supports text, images, videos, and combinations. Maximum total elements per request: 20. Maximum images per request: 5. Maximum videos per request: 1.

enable_fusion
boolean
default:false

Whether to generate a fused embedding. When true, all elements in contents are fused into a single vector; when false, each element produces an independent vector.

dimension
integer

The dimensionality of the output embedding vectors. Defaults to the model's max dimension (octen-vl-embedding: 2048, octen-vl-embedding-large: 4096). Any positive integer ≤ the model's max dimension is allowed.

fps
number
default:1

Controls the frame sampling density for video inputs. Smaller values reduce the number of extracted frames and lower video token consumption.

Required range: 0 <= x <= 1
instruct
string | null

Custom task description used to guide the model in understanding the query intent. Its length counts toward input_tokens and shares the 32,000-token total context limit with contents.

Response

Successful VL embedding response

code
integer

Business status code. 0 indicates success.

msg
string

A human-readable message describing the result.

request_id
string

The unique identifier for this request.

data
object

The main VL embedding response payload.

meta
object

Additional metadata for the VL embedding request.