Web Search API & Embedding API
The following QPS (Queries Per Second) limits apply to both the Web Search API and the Embedding API:| Subscription | QPS Limit |
|---|---|
| Free | 5 |
| Base | 20 |
| Pro | 50 |
| Scale | 100 |
| Enterprise | Custom |
How rate limits are applied
Rate limits are determined by your account plan and optional per-key caps:- Each account has a default rate limit based on its subscription tier.
- When creating an API key, you may configure an additional rate limit for that key.
- If both are configured, requests are throttled by whichever limit is lower (account or API key).
- Different APIs may have different rate limits.
- Limits may vary depending on your usage tier or agreement.
Web Chat API
The Web Chat API uses a rate limiting model based on requests per minute (RPM) and tokens per minute (TPM):| Metric | Limit |
|---|---|
| RPM | 20 |
| TPM | 500,000 |
What happens when you exceed a limit
If a request exceeds the allowed rate:- The request is rejected with a rate limit error.
- HTTP status:
429 - Response body:
code=429andmsgincludes the reason (account limit or API key limit) and the applicable rate limit.
Recommended retry behavior
When receiving a rate limit error:- Check the
msgfor detail. - Avoid immediate retries in a tight loop.
- Resume requests after the rate limit window resets.
- For high-throughput or bursty workloads, batching requests where supported can help reduce pressure on rate limits.
Monitoring your usage
You can monitor your API usage through the platform, including:- Request counts
- Tokens and content usage
- Total cost and daily cost
Increasing rate limits
If your application requires higher throughput or sustained traffic:- Custom limits or enterprise plans may be available.
- Contact the Octen team to discuss your use case.
- Support: support@octen.ai