Documentation Index
Fetch the complete documentation index at: https://docs.octen.ai/llms.txt
Use this file to discover all available pages before exploring further.
Octen enforces rate limits to ensure platform stability and fair usage across all customers.
Rate limits define how frequently you can call the API within a given time window. Requests that exceed these limits will be temporarily rejected.
Search API, Embedding API & VL Embedding API
The following QPS (Queries Per Second) limits are shared across the Search API, Embedding API, and VL Embedding API:
| Subscription | QPS Limit |
|---|
| Free | 5 |
| Base | 20 |
| Pro | 50 |
| Scale | 100 |
| Enterprise | Custom |
How rate limits are applied
Rate limits are determined by your account plan and optional per-key caps:
- Each account has a default rate limit based on its subscription tier.
- When creating an API key, you may configure an additional rate limit for that key.
- If both are configured, requests are throttled by whichever limit is lower (account or API key).
- Different APIs may have different rate limits.
- Limits may vary depending on your usage tier or agreement.
The Extract API uses a rate limiting model based on requests per minute (RPM):
Web Chat & Broad Search
Web Chat and Broad Search use a rate limiting model based on requests per minute (RPM) and tokens per minute (TPM):
| Metric | Limit |
|---|
| RPM | 20 |
| TPM | 500,000 |
Deep Research
Only a single concurrent request is supported.
What happens when you exceed a limit
If a request exceeds the allowed rate:
- The request is rejected with a rate limit error.
- HTTP status:
429
- Response body:
code=429 and msg includes the reason (account limit or API key limit) and the applicable rate limit.
Recommended retry behavior
When receiving a rate limit error:
- Check the
msg for detail.
- Avoid immediate retries in a tight loop.
- Resume requests after the rate limit window resets.
- For high-throughput or bursty workloads, batching requests where supported can help reduce pressure on rate limits.
Monitoring your usage
You can monitor your usage through the platform, including:
- Request counts
- Tokens and content usage
- Total cost and daily cost
Increasing rate limits
If your application requires higher throughput or sustained traffic:
- Custom limits or enterprise plans may be available.
- Contact the Octen team to discuss your use case.
- Support: support@octen.ai