Octen | The Search infrastructure for AI

Octen enforces rate limits to ensure platform stability and fair usage across all customers. Rate limits define how frequently you can call the API within a given time window. Requests that exceed these limits will be temporarily rejected.

Search API, Embedding API & VL Embedding API

The following QPS (Queries Per Second) limits are shared across the Search API, Embedding API, and VL Embedding API:

Subscription	QPS Limit
Free	5
Base	20
Pro	50
Scale	100
Enterprise	Custom

How rate limits are applied

Rate limits are determined by your account plan and optional per-key caps:

Each account has a default rate limit based on its subscription tier.
When creating an API key, you may configure an additional rate limit for that key.
If both are configured, requests are throttled by whichever limit is lower (account or API key).
Different APIs may have different rate limits.
Limits may vary depending on your usage tier or agreement.

Extract

The Extract API uses a rate limiting model based on requests per minute (RPM):

Metric	Limit
RPM	100

Web Chat & Broad Search

Web Chat and Broad Search use a rate limiting model based on requests per minute (RPM) and tokens per minute (TPM):

Metric	Limit
RPM	20
TPM	500,000

Deep Research

Only a single concurrent request is supported.

What happens when you exceed a limit

If a request exceeds the allowed rate:

The request is rejected with a rate limit error.
HTTP status: 429
Response body: code=429 and msg includes the reason (account limit or API key limit) and the applicable rate limit.

Recommended retry behavior

When receiving a rate limit error:

Check the msg for detail.
Avoid immediate retries in a tight loop.
Resume requests after the rate limit window resets.
For high-throughput or bursty workloads, batching requests where supported can help reduce pressure on rate limits.

Monitoring your usage

You can monitor your usage through the platform, including:

Request counts
Tokens and content usage
Total cost and daily cost

Increasing rate limits

If your application requires higher throughput or sustained traffic:

Custom limits or enterprise plans may be available.
Contact the Octen team to discuss your use case.
Support: support@octen.ai

​Search API, Embedding API & VL Embedding API

​How rate limits are applied

​Extract

​Web Chat & Broad Search

​Deep Research

​What happens when you exceed a limit

​Recommended retry behavior

​Monitoring your usage

​Increasing rate limits