Cataloging Images

While most traditional computer-vision models are specialized for specific tasks like image classification, captioning or tagging, VLM-1 can be used to simultaneously generate a wide range of structured outputs from images. This includes generating captions, tags, descriptions, and other structured data that can be used for cataloging, search, retrieval, and other applications.

Cataloging Product Images

Let’s look at a product cataloging example to see how vlm-1 can be used to generate structured data from images. In this example, we’ll use vlm-1 to generate captions, tags, and descriptions for a set of images of different products. This structured data can then be used to create a product catalog that can be searched, filtered, and analyzed in various ways. For this example, we’re going to use a small fashion dataset ashraq/fashion-product-images-small

Preview of the 'fashion-product-images-small' dataset from Huggingface.

1. Define a custom schema for cataloging

In the sections below, we’ll showcase a few notable features of the API for image cataloging. vlm-1 can automatically generate descriptions for products based on the images provided. This can be useful for creating detailed product listings, search results, or other content that requires structured descriptions of products. First let’s create a custom schema that will be used to generate the descriptions.

from typing import Literal
from pydantic import BaseModel, Field

class ProductCatalog(BaseModel):
    description: str = Field(..., description="A 2-sentence general visual description of the product embedded as an image.")
    category: str = Field(..., description="One or two-word category of the product (i.e, Apparel, Accessories, Footwear etc).")
    season: Literal["Fall", "Spring", "Summer", "Winter"] = Field(..., description="The season the product is intended for.")
    gender: Literal["Men", "Women", "Kids"] = Field(..., description="Gender or audience the product is intended for.")

2. Extract cataloging information from images

Once you have defined your custom schema, you can use vlm-1 to extract product cataloging information directly from images that conform to this schema. The extracted data will be validated against the schema you defined, ensuring that it conforms to the expected structure and types. We support querying the API via RESTful endpoints, or using the OpenAI Python SDK with our OpenAI-Compatible API.

from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.client.types import PredictionResponse, GenerationConfig

# Initialize the client
client = VLMRun(api_key="<VLMRUN_API_KEY>")

# Load the first image from the dataset and encode it as base64
ds = load_dataset("ashraq/fashion-product-images-small", split="train[:1%]")
image: Image.Image = next(ds["image"])

# Predict the product catalog information from the image
response: PredictionResponse = client.image.generate(
  images=[image],
  domain="retail.product-catalog",
  config=GenerationConfig(
    json_schema=ProductCatalog.model_json_schema(),
  )
)
response_dict = response.model_dump()

Example Product Cataloging Prediction

Let’s take a look at the sample output from the API for the first image of a navy plaid shirt in the product catalog. The API is able to generate a detailed description of the product, including the category, season, and gender it is intended for. This structured data can be used to create a product listing or search results for the product.

{
  "description": "A casual, button-up plaid shirt with short sleeves in a light fabric. The shirt features a combination of blue and white colors in a checkered pattern.",
  "category": "Apparel",
  "season": "Summer",
  "gender": "Men"
}

Let’s breakdown the output into their respective tasks:

Description (Captioning or Description Generation): Here, the API has generated a detailed description of the product, including the type of shirt, its features, and the colors and patterns it has. This can be useful for creating detailed product listings or search results for the product. This is a typical use-case for the Captioning or Description Generation task.
Category (Classification or Tagging): The API has also identified the category of the product as “Apparel”. This can be useful for categorizing products in a catalog or search results. This is a typical use-case for the Classification or Tagging task.
Season (Classification or Tagging): The API has identified the season the product is intended for as “Summer”. This can be useful for filtering products by season or for creating seasonal collections. This is a typical use-case for the Classification or Tagging task, however, the one additional feature is that we have a Literal type that restricts the possible values to a predefined set.
Gender (Classification or Tagging): The API has identified the gender the product is intended for as “Men”. This can be useful for filtering products by gender or audience. This is similar to the Season task, but with a different set of possible values.

Cataloging larger image catalogs

Once you have validated the output for a single image, you can scale this process to catalog larger volumes of images. You can use the same API call to generate structured data for multiple images, and then use this structured data to create a product catalog that can be searched, filtered, and analyzed in various ways. Better yet, you can also ingest the JSON directly into JSON-compatible databases like MongoDB, Elasticsearch, or even traditional SQL databases for searching over these images unlocking a wide range of semantic image-search and querying possibilities for your cataloging needs.

`vlm-1` predictions for the fashion dataset.

Fine-tuning for custom cataloging

For enterprise use-cases where you need to fine-tune the model for custom-tailored cataloging tasks and improved accuracy, you can use our fine-tuning guides to customize the model performance and scalability needs. This can include fine-tuning the model on your own data, customizing the model architecture, or adding new capabilities to the model. Fine-tuning can help you improve the accuracy and performance of the model for your specific cataloging tasks, and also help you scale the model to handle larger volumes of images with more efficient, lightweight fine-tuned models that are optimized for your specific use-case.

This feature is currently only available for our enterprise-tier customers. If you are interested in using this feature, please contact us.

Try our Image -> JSON API today

Head over to our Image -> JSON to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

Cataloging Images

Cataloging Product Images

1. Define a custom schema for cataloging

2. Extract cataloging information from images

Example Product Cataloging Prediction

Cataloging larger image catalogs

Fine-tuning for custom cataloging

Try our Image -> JSON API today

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

​Cataloging Product Images

​1. Define a custom schema for cataloging

​2. Extract cataloging information from images

​Example Product Cataloging Prediction

​Cataloging larger image catalogs

​Fine-tuning for custom cataloging

​Try our Image -> JSON API today

Cataloging Product Images

1. Define a custom schema for cataloging

2. Extract cataloging information from images

Example Product Cataloging Prediction

Cataloging larger image catalogs

Fine-tuning for custom cataloging

Try our Image -> JSON API today