Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.octen.ai/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "model": "anthropic/claude-opus-4.8",
  "messages": [
    {
      "role": "user",
      "content": "Explain attention in one sentence."
    }
  ],
  "stream": false,
  "max_tokens": 2048,
  "temperature": 1
}
'
{
  "id": "gen-1749812456-xyz7890",
  "object": "chat.completion",
  "created": 1749812456,
  "model": "anthropic/claude-opus-4.8",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Attention lets a model dynamically weight its inputs and focus on the most relevant information.",
        "refusal": null,
        "reasoning": null
      }
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 32,
    "total_tokens": 50
  }
}

Authorizations

x-api-key
string
header
required

API key used for request authentication. Obtain an API key before using the API. Note: A payment method is required to use the API.

Body

application/json

Request body for the Chat Completions API. Some parameters apply only to certain models; unsupported parameters are ignored for the selected model.

model
enum<string>
required

The model to use for chat completion.

Available options:
anthropic/claude-opus-4.8,
anthropic/claude-opus-4.6,
anthropic/claude-sonnet-4.6,
anthropic/claude-haiku-4.5,
google/gemini-3.5-flash,
google/gemini-3.1-pro-preview,
google/gemini-3.1-flash-lite,
google/gemini-3-flash-preview,
openai/gpt-5.5-pro,
openai/gpt-5.5,
openai/gpt-5.4,
moonshotai/kimi-k2.6,
moonshotai/kimi-k2.5,
minimax/minimax-m2.5,
qwen/qwen3.6-plus
messages
object[]
required

The conversation so far. System prompt plus user and assistant messages in chronological order.

A system prompt message that sets the behavior or context for the model.

tools
object[]

Tool definitions. Supports custom function tools and the built-in octen_search server tool.

tool_choice

Controls tool invocation. none: never call tools; auto: model decides (default); required: must call a tool. Can also be an object to force a specific tool. Only valid when tools is set.

Available options:
none,
auto,
required
parallel_tool_calls
boolean
default:true

Whether the model may issue multiple tool calls in one reply. When false, at most one tool call per turn.

stream
boolean
default:false

Whether to enable streaming output. When true, returns chat.completion.chunk objects incrementally.

max_tokens
integer

Maximum number of tokens the model can output. If not set, the model's internal default limit is used.

Required range: x >= 1
max_completion_tokens
integer

Maximum completion tokens, including reasoning and visible output tokens. If not set, the model's internal default limit is used.

Required range: x >= 1
temperature
number
default:1

Controls randomness in generation. Higher values produce more diverse output; lower values produce more deterministic output.

Required range: 0 <= x <= 2
top_p
number
default:1

Nucleus sampling. Only tokens with cumulative probability up to top_p are considered.

Required range: x <= 1
top_k
integer
default:0

Sample only from the top K most probable tokens. 0 disables it.

Required range: x >= 0
min_p
number
default:0

Minimum probability threshold relative to the most probable token. Tokens below it are filtered out. 0 disables it.

Required range: 0 <= x <= 1
top_a
number
default:0

Dynamic filtering threshold based on the most probable token. 0 disables it.

Required range: 0 <= x <= 1
repetition_penalty
number
default:1

Penalizes tokens already present in the input. Above 1 suppresses repetition; below 1 encourages it.

Required range: x <= 2
frequency_penalty
number
default:0

Penalizes tokens by their frequency in the output so far. Positive values reduce repetition.

Required range: -2 <= x <= 2
presence_penalty
number
default:0

Penalizes tokens that have already appeared. Positive values encourage new topics.

Required range: -2 <= x <= 2
response_format
object

Controls the output format. Some models may not support structured output and will automatically fall back to text.

stop
string[]

Stop sequences. Generation stops when any of these strings is encountered.

seed
integer

Seed for reproducibility. With the same parameters and model version, output should be as consistent as possible.

reasoning
object

Options for reasoning models. Sets the thinking effort and budget.

verbosity
enum<string>
default:medium

Controls how verbose the reply is.

Available options:
low,
medium,
high
logit_bias
object

A JSON object mapping token IDs to bias values (-100 to 100), added to the logits before sampling.

logprobs
boolean
default:false

Whether to return the log probabilities of the output tokens.

top_logprobs
integer

Number of most likely tokens to return at each position. Requires logprobs to be true.

Required range: 0 <= x <= 20
user
string

A unique identifier for the end user. Use hashed or pseudonymous identifiers to avoid passing personally identifiable information.

previous_response_id
string

The id of a previous response, used to chain state across turns. Only effective for Responses-API models; the gen--prefixed ids of regular completions cannot be used.

Response

Successful chat completion. When stream=false, returns a single chat.completion object. When stream=true, returns a stream of chat.completion.chunk objects (search_done, content, finish, usage), followed by data: [DONE].

A non-streaming chat completion response. Returned when stream=false.

id
string
required

The unique identifier for this request.

object
enum<string>
required

The object type, always chat.completion for non-streaming responses.

Available options:
chat.completion
created
integer
required

Unix timestamp (in seconds) of when the completion was created.

model
string
required

The model used for this completion.

choices
object[]
required

A list of completion choices.

search_results
object[]

Search results. Present only when octen_search was actually triggered.

usage
object

Token usage information. When stream=true, returned only in the final usage chunk.

warning
string

Warning message, if any.