Skip to main content
POST
/
extract
curl --request POST \
  --url https://api.octen.ai/extract \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "urls": [
    "https://octen.ai/",
    "https://docs.octen.ai/api-reference/search"
  ]
}
'
{
  "code": 0,
  "msg": "success",
  "request_id": "req_abc123def456",
  "data": {
    "results": [
      {
        "url": "https://octen.ai/",
        "status": "success",
        "title": "Octen | The Search infrastructure for AI",
        "full_content": "# Search infrastructure for AI\n\nReal-time indexing | Low latency | High reliability\n\nStart Building | View Docs\n\n## Search Beyond Text\n\nBeyond text queries: Octen's multimodal search understands images and videos alongside text...",
        "highlights": null,
        "time_published": null,
        "time_last_crawled": "2026-07-03T09:35:43Z",
        "page_structure": {
          "primary": "Index Page",
          "secondary": "Home Page"
        },
        "category": {
          "primary": "Computers, Electronics & Technology",
          "secondary": "Search Engines"
        },
        "favicon": "https://octen.ai/favicon.ico",
        "cover_image": {
          "url": "https://octen.ai/_next/static/media/octen-cover.dc74905e.png"
        },
        "images": [
          {
            "url": "https://octen.ai/_next/static/media/multi-modal-img.ccffddff.png"
          },
          {
            "url": "https://octen.ai/_next/static/media/showcase-multimodal1.449ae003.png"
          }
        ],
        "videos": [
          {
            "url": "https://octen.ai/static/video/multimodal-video.mp4"
          }
        ]
      },
      {
        "url": "https://docs.octen.ai/api-reference/search",
        "status": "success",
        "title": "Search - Octen",
        "full_content": "# Search API\n\nOcten Search API enables ranked web results with query-focused highlights, time filtering, and multimodal assets...",
        "highlights": null,
        "time_published": "2024-10-15T00:00:00Z",
        "time_last_crawled": "2026-07-03T09:34:29Z",
        "page_structure": {
          "primary": "Content Page",
          "secondary": "Code"
        },
        "category": {
          "primary": "Computers, Electronics & Technology",
          "secondary": "Search Engines"
        },
        "favicon": "https://docs.octen.ai/favicon.ico",
        "cover_image": {
          "url": "https://octen.ai/_next/static/media/octen-cover.dc74905e.png"
        }
      }
    ]
  },
  "meta": {
    "usage": {
      "total_urls": 2,
      "successful_urls": 2
    },
    "latency": 1832,
    "warning": null
  }
}

Authorizations

x-api-key
string
header
required

API key used for request authentication. Obtain an API key before using the API. Note: A payment method is required to use the API.

Body

application/json

Request body for the Extract API.

urls
string[]
required

List of URLs to extract content from. Maximum URLs per request: 20. Maximum length per URL: 2048. Failed URLs are not billed.

Example:
[
"https://example.com/article-1",
"https://example.com/article-2"
]
query
string

Intent-focused keywords. When provided, returns query-relevant highlights per URL; otherwise returns the complete page content.

Maximum string length: 500
max_age_seconds
integer
default:86400

Maximum age (in seconds) of cached content. URLs whose cached version exceeds this threshold will be re-fetched. Values outside the allowed range are adjusted to the nearest bound.

Required range: x >= 300
format
enum<string>
default:markdown

Format of the returned content.

Available options:
markdown,
text
timeout
integer
default:30

Per-URL extraction timeout in seconds. Values outside the allowed range are adjusted to the nearest bound.

Required range: 1 <= x <= 60
include_images
boolean
default:false

Whether to return image URLs detected on the page.

include_videos
boolean
default:false

Whether to return video URLs detected on the page.

include_audio
boolean
default:false

Whether to return audio URLs detected on the page.

Response

Successful extraction response

code
integer

Business status code. 0 indicates success.

msg
string

A message describing the result.

request_id
string

The unique identifier for this request.

data
object

The main extract response payload.

meta
object

Additional metadata for the extract request.