Caption & Tag

Generate comprehensive, contextual captions for images using state-of-the-art vision-language models. Perfect for accessibility, content management, and automated image analysis workflows.

Image captioning example showing detailed scene description

Example image to be captioned.

Example Response

This is an example of the response from the Chat Completions API example (using the image shown above):

A classic, light turquoise Volkswagen Beetle with chrome accents is parked on a cobblestone street, set against a warm yellow stucco wall with rustic brown wooden doors and windows.
Tags: car, volkswagen, beetle, street, cobblestone, wooden, doors, windows

Usage Example

from vlmrun.client import VLMRun

# Initialize the VLM Run client
client = VLMRun(
  base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>"
)

# Caption the image
response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {"role": "user",
        "content": [
          {"type": "text", "text": "Generate a detailed caption for this image"},
          {"type": "image_url", "image_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/image.caption/car.jpg", "detail": "auto"}}
        ]
      }
    ],
)

# Print the response
print(response.choices[0].message.content)
>> "A classic, light turquoise Volkswagen Beetle with chrome accents is parked on a cobblestone street, set against a warm yellow stucco wall with rustic brown wooden doors and windows."

FAQ

How do I ask the model for more detailed captions?

You can ask simply ask for a more detailed caption by providing a more detailed prompt. In most cases, you can provide the number of words you want the caption to be, and the model will generate a more detailed caption.

What tags are supported?

Common Objects: person, car, truck, bus, bicycle, motorcycle
Scenes: street, building, park, forest, beach, etc.
Time-of-Day: morning, afternoon, evening, night
Weather: sunny, cloudy, rainy, snowing, etc.

What format do the tags come in?

The tags come in the format of a list of strings.

Get Started

Image Capabilities

Document Capabilities

Video Capabilities

Misc

Example Response

Usage Example

FAQ

Get Started

Image Capabilities

Document Capabilities

Video Capabilities

Misc

​Example Response

​Usage Example

​FAQ

Example Response

Usage Example

FAQ