Learn how to generate captions, tags and descriptions for images.
vlm-1
can be used to generate structured data from images. In this example, we’ll use vlm-1
to generate captions, tags, and descriptions for a set of images of different products. This structured data can then be used to create a product catalog that can be searched, filtered, and analyzed in various ways.
For this example, we’re going to use a small fashion dataset ashraq/fashion-product-images-small
Preview of the 'fashion-product-images-small' dataset from Huggingface.
vlm-1
can automatically generate descriptions for products based on the images provided. This can be useful for creating detailed product listings, search results, or other content that requires structured descriptions of products. First let’s create a custom schema that will be used to generate the descriptions.
vlm-1
to extract product cataloging information directly from images that conform to this schema. The extracted data will be validated against the schema you defined, ensuring that it conforms to the expected structure and types.
We support querying the API via RESTful endpoints, or using the OpenAI Python SDK with our OpenAI-Compatible API.
Captioning
or Description Generation
): Here, the API has generated a detailed description of the product, including the type of shirt, its features, and the colors and patterns it has. This can be useful for creating detailed product listings or search results for the product. This is a typical use-case for the Captioning
or Description Generation
task.Classification
or Tagging
): The API has also identified the category of the product as “Apparel”. This can be useful for categorizing products in a catalog or search results. This is a typical use-case for the Classification
or Tagging
task.Classification
or Tagging
): The API has identified the season the product is intended for as “Summer”. This can be useful for filtering products by season or for creating seasonal collections. This is a typical use-case for the Classification
or Tagging
task, however, the one additional feature is that we have a Literal
type that restricts the possible values to a predefined set.Classification
or Tagging
): The API has identified the gender the product is intended for as “Men”. This can be useful for filtering products by gender or audience. This is similar to the Season
task, but with a different set of possible values.`vlm-1` predictions for the fashion dataset.