Chat Completions
Creates a chat completion. Compatible with the OpenAI Chat Completions protocol, with an optional built-in octen_search Web Search tool.
Authorizations
API key used for request authentication. Obtain an API key before using the API. Note: A payment method is required to use the API.
Body
Request body for the Chat Completions API. Some parameters apply only to certain models; unsupported parameters are ignored for the selected model.
The model to use for chat completion.
anthropic/claude-opus-4.8, anthropic/claude-opus-4.6, anthropic/claude-sonnet-4.6, anthropic/claude-haiku-4.5, google/gemini-3.5-flash, google/gemini-3.1-pro-preview, google/gemini-3.1-flash-lite, google/gemini-3-flash-preview, openai/gpt-5.5-pro, openai/gpt-5.5, openai/gpt-5.4, moonshotai/kimi-k2.6, moonshotai/kimi-k2.5, minimax/minimax-m2.5, qwen/qwen3.6-plus The conversation so far. System prompt plus user and assistant messages in chronological order.
A system prompt message that sets the behavior or context for the model.
- Option 1
- Option 2
- Option 3
- Option 4
- Option 5
Tool definitions. Supports custom function tools and the built-in octen_search server tool.
Controls tool invocation. none: never call tools; auto: model decides (default); required: must call a tool. Can also be an object to force a specific tool. Only valid when tools is set.
none, auto, required Whether the model may issue multiple tool calls in one reply. When false, at most one tool call per turn.
Whether to enable streaming output. When true, returns chat.completion.chunk objects incrementally.
Maximum number of tokens the model can output. If not set, the model's internal default limit is used.
x >= 1Maximum completion tokens, including reasoning and visible output tokens. If not set, the model's internal default limit is used.
x >= 1Controls randomness in generation. Higher values produce more diverse output; lower values produce more deterministic output.
0 <= x <= 2Nucleus sampling. Only tokens with cumulative probability up to top_p are considered.
x <= 1Sample only from the top K most probable tokens. 0 disables it.
x >= 0Minimum probability threshold relative to the most probable token. Tokens below it are filtered out. 0 disables it.
0 <= x <= 1Dynamic filtering threshold based on the most probable token. 0 disables it.
0 <= x <= 1Penalizes tokens already present in the input. Above 1 suppresses repetition; below 1 encourages it.
x <= 2Penalizes tokens by their frequency in the output so far. Positive values reduce repetition.
-2 <= x <= 2Penalizes tokens that have already appeared. Positive values encourage new topics.
-2 <= x <= 2Controls the output format. Some models may not support structured output and will automatically fall back to text.
Stop sequences. Generation stops when any of these strings is encountered.
Seed for reproducibility. With the same parameters and model version, output should be as consistent as possible.
Options for reasoning models. Sets the thinking effort and budget.
Controls how verbose the reply is.
low, medium, high A JSON object mapping token IDs to bias values (-100 to 100), added to the logits before sampling.
Whether to return the log probabilities of the output tokens.
Number of most likely tokens to return at each position. Requires logprobs to be true.
0 <= x <= 20A unique identifier for the end user. Use hashed or pseudonymous identifiers to avoid passing personally identifiable information.
The id of a previous response, used to chain state across turns. Only effective for Responses-API models; the gen--prefixed ids of regular completions cannot be used.
Response
Successful chat completion. When stream=false, returns a single chat.completion object. When stream=true, returns a stream of chat.completion.chunk objects (search_done, content, finish, usage), followed by data: [DONE].
- Option 1
- Option 2
A non-streaming chat completion response. Returned when stream=false.
The unique identifier for this request.
The object type, always chat.completion for non-streaming responses.
chat.completion Unix timestamp (in seconds) of when the completion was created.
The model used for this completion.
A list of completion choices.
Search results. Present only when octen_search was actually triggered.
Token usage information. When stream=true, returned only in the final usage chunk.
Warning message, if any.