Skip to main content
POST
/
v1
/
images
/
generations
Image Generation
curl --request POST \
  --url https://api.octen.ai/v1/images/generations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "prompt": "<string>",
  "image": "<string>",
  "mask": "<string>",
  "n": 1,
  "size": "auto",
  "quality": "auto",
  "background": "auto",
  "output_format": "png",
  "output_compression": 100,
  "moderation": "auto",
  "thinking_level": "minimal",
  "response_modalities": [
    "text",
    "image"
  ],
  "response_format": "b64_json",
  "stream": false,
  "partial_images": 0,
  "user": "<string>"
}
'
{
  "id": "img-9f2c1a7b3e6d4082",
  "created": 1749812456,
  "background": "opaque",
  "output_format": "png",
  "quality": "high",
  "size": "1536x1024",
  "data": [
    {
      "b64_json": "iVBORw0KGgoAAAANSUhEUgAA..."
    }
  ],
  "usage": {
    "input_tokens": 22,
    "input_tokens_details": {
      "text_tokens": 22,
      "image_tokens": 0
    },
    "output_tokens": 1056,
    "output_tokens_details": {
      "image_tokens": 1056,
      "text_tokens": 0
    },
    "total_tokens": 1078
  }
}

Authorizations

x-api-key
string
header
required

API key used for request authentication. Obtain an API key before using the API. Note: A payment method is required to use the API.

Body

Request body for the Image Generation API. Text-to-image when no image is provided; image editing when an image is provided. Parameters marked (GPT only) or (Gemini only) apply only to those model families; unsupported parameters are ignored.

model
enum<string>
required

The model to use. Aliases nano-banana-pro (= google/gemini-3-pro-image) and nano-banana-2 (= google/gemini-3.1-flash-image) are also accepted.

Available options:
openai/gpt-image-2,
openai/gpt-image-1-mini,
google/gemini-3-pro-image,
google/gemini-3.1-flash-image
prompt
string
required

The text description for the model. The length limit is set by the model.

image

Reference/source image(s) for editing, base64-encoded (a data URL prefix or a raw base64 string). Providing an image enters edit mode. An array passes multiple reference images. Per-model limits apply.

mask
string

(GPT only) Mask image, base64-encoded PNG matching the input size. Transparent areas are repainted. Effective only when image is provided.

n
integer
default:1

(GPT only) Number of images to generate. For Gemini, request multiple images in the prompt instead.

Required range: 1 <= x <= 10
size
string
default:auto

Output size as width x height (e.g. 1024x1024) or auto. GPT accepts arbitrary sizes; Gemini maps to the nearest supported aspect ratio.

quality
enum<string>
default:auto

(GPT only) Generation quality. Higher is more detailed, slower, and more expensive.

Available options:
low,
medium,
high,
auto
background
enum<string>
default:auto

(GPT only) Background setting. gpt-image-2 does not support transparent backgrounds and returns 400 for transparent.

Available options:
transparent,
opaque,
auto
output_format
enum<string>
default:png

Output image format. GPT supports png/jpeg/webp; Gemini supports png/jpeg (webp falls back to png).

Available options:
png,
jpeg,
webp
output_compression
integer
default:100

(GPT only) Compression level (percent). Effective only when output_format is jpeg or webp.

Required range: 0 <= x <= 100
moderation
enum<string>
default:auto

(GPT only) Content moderation strength. low relaxes limits; auto is the default.

Available options:
low,
auto
thinking_level
enum<string>
default:minimal

(Gemini only) Thinking (reasoning) effort before generation.

Available options:
minimal,
high
response_modalities
enum<string>[]

(Gemini only) Controls returned modalities. ["text","image"] returns the image plus accompanying text; ["image"] returns only the image.

Available options:
text,
image
media_resolution
enum<string>

(Gemini only) Processing resolution for input reference images.

Available options:
low,
medium,
high
response_format
enum<string>
default:b64_json

Return format. Only base64 is supported.

Available options:
b64_json
stream
boolean
default:false

(GPT only) Whether to stream the response (SSE).

partial_images
integer
default:0

(GPT only) Number of intermediate preview images to stream. Effective only when stream=true. Each preview costs an extra 100 image output tokens.

Required range: 0 <= x <= 3
user
string

A unique identifier for the end user. Use hashed or pseudonymous identifiers to avoid passing personally identifiable information.

Response

Successful image response. When stream=false, returns a single object. When stream=true (GPT only), returns an SSE stream of image_generation.partial_image / image_edit.partial_image preview events followed by a image_generation.completed / image_edit.completed event.

A non-streaming image response. Images are always returned as base64; no URL is returned.

id
string
required

The unique identifier for this request.

created
integer
required

Unix timestamp (in seconds) of when the request was created.

data
object[]
required

The generated images, one element per image.

usage
object
required

Token usage information.

background
enum<string>

(GPT only) The background setting actually applied.

Available options:
transparent,
opaque
output_format
enum<string>

The output format actually applied; for Gemini, derived from the mimeType.

Available options:
png,
jpeg,
webp
quality
enum<string>

(GPT only) The quality tier actually applied.

Available options:
low,
medium,
high
size
string

(GPT only) The output size actually applied; useful when size was auto.

text
string

(Gemini only) Plain-text summary of text the model produced alongside the image. For exact interleaving, read parts.

parts
object[]

(Gemini only) Content parts in the model's original order, preserving text/image interleaving.