> ## Documentation Index
> Fetch the complete documentation index at: https://docs.octen.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate Limits

Octen enforces rate limits to ensure platform stability and fair usage across all customers.
Rate limits define how frequently you can call the API within a given time window. Requests that exceed these limits will be temporarily rejected.

## Search API, Embedding API & VL Embedding API

The following QPS (Queries Per Second) limits are shared across the **Search API**, **Embedding API**, and **VL Embedding API**:

| Subscription | QPS Limit |
| :----------- | :-------- |
| Free         | 5         |
| Base         | 20        |
| Pro          | 50        |
| Scale        | 100       |
| Enterprise   | Custom    |

### How rate limits are applied

Rate limits are determined by your account plan and optional per-key caps:

* Each account has a default rate limit based on its subscription tier.
* When creating an API key, you may configure an additional rate limit for that key.
* If both are configured, requests are throttled by whichever limit is lower (account or API key).
* Different APIs may have different rate limits.
* Limits may vary depending on your usage tier or agreement.

## Extract

The **Extract API** uses a rate limiting model based on requests per minute (RPM):

| Metric | Limit |
| :----- | :---- |
| RPM    | 100   |

## Web Chat & Broad Search

Web Chat and Broad Search use a rate limiting model based on requests per minute (RPM) and tokens per minute (TPM):

| Metric | Limit   |
| :----- | :------ |
| RPM    | 20      |
| TPM    | 500,000 |

## Deep Research

Only a single concurrent request is supported.

## What happens when you exceed a limit

If a request exceeds the allowed rate:

* The request is rejected with a rate limit error.
* HTTP status: `429`
* Response body: `code=429` and `msg` includes the reason (account limit or API key limit) and the applicable rate limit.

## Recommended retry behavior

When receiving a rate limit error:

* Check the `msg` for detail.
* Avoid immediate retries in a tight loop.
* Resume requests after the rate limit window resets.
* For high-throughput or bursty workloads, batching requests where supported can help reduce pressure on rate limits.

## Monitoring your usage

You can monitor your usage through the platform, including:

* Request counts
* Tokens and content usage
* Total cost and daily cost

## Increasing rate limits

If your application requires higher throughput or sustained traffic:

* Custom limits or enterprise plans may be available.
* Contact the Octen team to discuss your use case.
* Support: [support@octen.ai](mailto:support@octen.ai)
