Use this file to discover all available pages before exploring further.
Generate comprehensive, contextual captions for images using state-of-the-art vision-language models. Perfect for accessibility, content management, and automated image analysis workflows.
This is an example of the response from the Chat Completions API example (using the image shown above):
A classic, light turquoise Volkswagen Beetle with chrome accents is parked on a cobblestone street, set against a warm yellow stucco wall with rustic brown wooden doors and windows.Tags: car, volkswagen, beetle, street, cobblestone, wooden, doors, windows
How do I ask the model for more detailed captions?
You can ask simply ask for a more detailed caption by providing a more detailed prompt. In most cases, you can provide the number of words you want the caption to be, and the model will generate a more detailed caption.
What tags are supported?
Common Objects: person, car, truck, bus, bicycle, motorcycle
Scenes: street, building, park, forest, beach, etc.