Creates a chat completion with web search augmentation. Supports multiple LLM providers, streaming, tool calling, structured output, and reasoning models.
API key used for request authentication. Obtain an API key before using the API.
Request body for the Web Chat API.
The model to use for chat completion.
anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.6, anthropic/claude-haiku-4.5, google/gemini-3-flash-preview, google/gemini-3.1-pro-preview, google/gemini-3.1-flash-lite-preview, openai/gpt-5.4, openai/gpt-oss-120b, moonshotai/kimi-k2.5, minimax/minimax-m2.5 A list of messages comprising the conversation so far. System prompt + user and assistant messages in chronological order.
A single message in the conversation. Discriminated by role.
Whether to enable web search augmentation. on enables web search (default); off disables it and queries the model directly.
on, off Search-related options for the Web Chat API. All parameters are optional and share the same semantics and defaults as the Web Search API. The query is automatically generated from the messages and does not need to be provided.
Whether to enable streaming output. When true, returns chat.completion.chunk objects incrementally.
Maximum number of tokens the model can output. If not set, the model's internal default limit is used.
x >= 1Maximum completion tokens (including reasoning tokens and visible output tokens). If not set, the model's internal default limit is used.
x >= 1Controls randomness in generation. Higher values produce more diverse output; lower values produce more deterministic output.
0 <= x <= 2Nucleus sampling parameter. Only tokens with cumulative probability up to top_p are considered. Smaller values produce more conservative output.
x <= 1Penalizes tokens based on their frequency in the output so far. Positive values reduce repetition; negative values encourage consistency.
-2 <= x <= 2Penalizes tokens that have already appeared in the output. Positive values encourage topic diversity; negative values encourage focus.
-2 <= x <= 2Controls the output format. Some models may not support structured output and will automatically fall back to text.
Stop sequences. Generation stops when any of these strings is encountered. May not be supported by all models.
Seed for reproducibility. With the same parameters and model version, output should be as consistent as possible.
Options for reasoning models.
A JSON object mapping token IDs to bias values (-100 to 100). The bias is added to the model's logits before sampling.
Whether to return log probabilities of output tokens.
Number of most likely tokens to return at each position with their log probabilities. Requires logprobs to be true.
0 <= x <= 20Tool definitions for function calling. Follows the OpenAI tool calling format; non-OpenAI models are automatically converted.
Controls tool invocation behavior. none: never call tools; auto: model decides; required: must call a tool. Can also be an object to specify a particular function: {"type": "function", "function": {"name": "my_function"}}.
none, auto, required A unique identifier for the end user. Use hashed or pseudonymous identifiers to avoid passing personally identifiable information.
Successful chat completion response. When stream=false, returns a single chat.completion object. When stream=true, returns a stream of chat.completion.chunk objects with types: search_done (search results), content (incremental content), finish (completion signal), and usage (token usage).
A non-streaming chat completion response. Returned when stream=false.
The unique identifier for this request.
The object type, always chat.completion for non-streaming responses.
chat.completion Unix timestamp (in seconds) of when the completion was created.
The model used for this completion.
A list of completion choices.
Search results used to augment the response. Only present when web search was used.
Metadata for the chat completion response.
Warning message, if any.