Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.octen.ai/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [
    {
      "role": "system",
      "content": "Be precise and concise."
    },
    {
      "role": "user",
      "content": "How many stars are there in our galaxy?"
    }
  ],
  "web_search": "on",
  "stream": false,
  "max_tokens": 2048,
  "temperature": 1
}
'
{
  "request_id": "gen-1749812345-abcd1234",
  "object": "chat.completion",
  "created": 1749812345,
  "model": "anthropic/claude-sonnet-4.6",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "According to search results..."
      }
    }
  ],
  "search_results": [
    {
      "query": "What is OpenRouter AI",
      "results": [
        {
          "title": "OpenRouter Documentation",
          "url": "https://openrouter.ai/docs",
          "highlight": "OpenRouter provides a unified API for multiple LLM providers",
          "full_content": "OpenRouter is a platform that allows developers to access multiple large language models through a single API...",
          "authors": "OpenRouter",
          "time_published": "2024-11-02T00:00:00Z",
          "time_last_crawled": "2025-01-10T12:30:00Z"
        }
      ],
      "latency": 62
    }
  ],
  "meta": {
    "usage": {
      "num_search_queries": 1,
      "full_content_tokens": 0,
      "prompt_tokens": 58,
      "completion_tokens": 32,
      "total_tokens": 90,
      "reasoning_tokens": 0
    },
    "latency": 1200
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.octen.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

x-api-key
string
header
required

API key used for request authentication. Obtain an API key before using the API. Note: A payment method is required to use the API.

Body

application/json

Request body for the Web Chat API.

model
enum<string>
required

The model to use for chat completion.

Available options:
anthropic/claude-sonnet-4.6,
anthropic/claude-opus-4.6,
anthropic/claude-haiku-4.5,
google/gemini-3-flash-preview,
google/gemini-3.1-pro-preview,
google/gemini-3.1-flash-lite-preview,
openai/gpt-5.4,
openai/gpt-oss-120b,
moonshotai/kimi-k2.5,
minimax/minimax-m2.5
messages
object[]
required

A list of messages comprising the conversation so far. System prompt + user and assistant messages in chronological order.

A single message in the conversation. Discriminated by role.

Whether to enable search augmentation.

Available options:
auto,
on,
off
web_search_options
object

Search-related options for the Web Chat API. All parameters are optional and share the same semantics and defaults as the Search API. The query is automatically generated from the messages and does not need to be provided.

stream
boolean
default:false

Whether to enable streaming output. When true, returns chat.completion.chunk objects incrementally.

max_tokens
integer

Maximum number of tokens the model can output. If not set, the model's internal default limit is used.

Required range: x >= 1
max_completion_tokens
integer

Maximum completion tokens (including reasoning tokens and visible output tokens). If not set, the model's internal default limit is used.

Required range: x >= 1
temperature
number
default:1

Controls randomness in generation. Higher values produce more diverse output; lower values produce more deterministic output.

Required range: 0 <= x <= 2
top_p
number
default:1

Nucleus sampling parameter. Only tokens with cumulative probability up to top_p are considered. Smaller values produce more conservative output.

Required range: x <= 1
frequency_penalty
number
default:0

Penalizes tokens based on their frequency in the output so far. Positive values reduce repetition; negative values encourage consistency.

Required range: -2 <= x <= 2
presence_penalty
number
default:0

Penalizes tokens that have already appeared in the output. Positive values encourage topic diversity; negative values encourage focus.

Required range: -2 <= x <= 2
response_format
object

Controls the output format. Some models may not support structured output and will automatically fall back to text.

stop
string[]

Stop sequences. Generation stops when any of these strings is encountered. May not be supported by all models.

seed
integer

Seed for reproducibility. With the same parameters and model version, output should be as consistent as possible.

reasoning
object

Options for reasoning models. When stream=false, reasoning models (google/gemini-3.1-pro-preview, moonshotai/kimi-k2.5, openai/gpt-oss-120b, minimax/minimax-m2.5) include their reasoning process as <think>...</think> tags within the content field. When stream=true, reasoning content is delivered via a separate delta.reasoning_content field (not in content). In both cases, the token count is reported in reasoning_tokens in the usage metadata.

logit_bias
object

A JSON object mapping token IDs to bias values (-100 to 100). The bias is added to the model's logits before sampling.

tools
object[]

Tool definitions for function calling. Follows the OpenAI tool calling format; non-OpenAI models are automatically converted.

tool_choice

Controls tool invocation behavior. none: never call tools; auto: model decides; required: must call a tool. Can also be an object to specify a particular function: {"type": "function", "function": {"name": "my_function"}}.

Available options:
none,
auto,
required
user
string

A unique identifier for the end user. Use hashed or pseudonymous identifiers to avoid passing personally identifiable information.

Response

Successful chat completion response. When stream=false, returns a single chat.completion object. When stream=true, returns a stream of chat.completion.chunk objects with types: search_done (search results), content (incremental content), finish (completion signal), and usage (token usage).

A non-streaming chat completion response. Returned when stream=false.

request_id
string
required

The unique identifier for this request.

object
enum<string>
required

The object type, always chat.completion for non-streaming responses.

Available options:
chat.completion
created
number
required

Unix timestamp (in seconds) of when the completion was created.

model
string
required

The model used for this completion.

choices
object[]
required

A list of completion choices.

search_results
object[]

Search results used to augment the response. Only present when search was used.

meta
object

Metadata for the chat completion response.

warning
string | null

Warning message, if any.