Image Generation
Generates or edits images. Compatible with the OpenAI Images protocol. Text-to-image when no image is provided; image editing when an image is provided.
Authorizations
API key used for request authentication. Obtain an API key before using the API. Note: A payment method is required to use the API.
Body
Request body for the Image Generation API. Text-to-image when no image is provided; image editing when an image is provided. Parameters marked (GPT only) or (Gemini only) apply only to those model families; unsupported parameters are ignored.
The model to use. Aliases nano-banana-pro (= google/gemini-3-pro-image) and nano-banana-2 (= google/gemini-3.1-flash-image) are also accepted.
openai/gpt-image-2, openai/gpt-image-1-mini, google/gemini-3-pro-image, google/gemini-3.1-flash-image The text description for the model. The length limit is set by the model.
Reference/source image(s) for editing, base64-encoded (a data URL prefix or a raw base64 string). Providing an image enters edit mode. An array passes multiple reference images. Per-model limits apply.
(GPT only) Mask image, base64-encoded PNG matching the input size. Transparent areas are repainted. Effective only when image is provided.
(GPT only) Number of images to generate. For Gemini, request multiple images in the prompt instead.
1 <= x <= 10Output size as width x height (e.g. 1024x1024) or auto. GPT accepts arbitrary sizes; Gemini maps to the nearest supported aspect ratio.
(GPT only) Generation quality. Higher is more detailed, slower, and more expensive.
low, medium, high, auto (GPT only) Background setting. gpt-image-2 does not support transparent backgrounds and returns 400 for transparent.
transparent, opaque, auto Output image format. GPT supports png/jpeg/webp; Gemini supports png/jpeg (webp falls back to png).
png, jpeg, webp (GPT only) Compression level (percent). Effective only when output_format is jpeg or webp.
0 <= x <= 100(GPT only) Content moderation strength. low relaxes limits; auto is the default.
low, auto (Gemini only) Thinking (reasoning) effort before generation.
minimal, high (Gemini only) Controls returned modalities. ["text","image"] returns the image plus accompanying text; ["image"] returns only the image.
text, image (Gemini only) Processing resolution for input reference images.
low, medium, high Return format. Only base64 is supported.
b64_json (GPT only) Whether to stream the response (SSE).
(GPT only) Number of intermediate preview images to stream. Effective only when stream=true. Each preview costs an extra 100 image output tokens.
0 <= x <= 3A unique identifier for the end user. Use hashed or pseudonymous identifiers to avoid passing personally identifiable information.
Response
Successful image response. When stream=false, returns a single object. When stream=true (GPT only), returns an SSE stream of image_generation.partial_image / image_edit.partial_image preview events followed by a image_generation.completed / image_edit.completed event.
A non-streaming image response. Images are always returned as base64; no URL is returned.
The unique identifier for this request.
Unix timestamp (in seconds) of when the request was created.
The generated images, one element per image.
Token usage information.
(GPT only) The background setting actually applied.
transparent, opaque The output format actually applied; for Gemini, derived from the mimeType.
png, jpeg, webp (GPT only) The quality tier actually applied.
low, medium, high (GPT only) The output size actually applied; useful when size was auto.
(Gemini only) Plain-text summary of text the model produced alongside the image. For exact interleaving, read parts.
(Gemini only) Content parts in the model's original order, preserving text/image interleaving.