Image Captioning - Inference.net Documentation

Best fit
Recommended stack
Workflow
Related pages

Use this tutorial when you want reliable captions, alt text, or lightweight image metadata.

Best fit

accessibility alt text
product image descriptions
editorial preview text
image metadata enrichment

Recommended stack

vision-capable model
structured outputs if your app expects typed fields like alt_text
batch when captioning large image collections offline

Workflow

send the image as a data URI
ask for a concise, objective caption
switch to a JSON schema if your app needs structured fields
batch the workflow when the volume grows

Classification with Structured Outputs

⌘I