vlm-1
can intelligently classify images based on their content, composition, and visual characteristics. This enables robust classification of images into various categories, even when they come in different styles, lighting conditions, or perspectives.
For example, below is a diagram showing how an image can be classified into different types, and how each type can have its own custom post-processing logic.
Classifying TV Images
Let’s look at a TV image classification example to see howvlm-1
can be used to automatically analyze and categorize television content. In this example, we’ll use vlm-1
to classify TV screenshots and frames into categories like news broadcasts, entertainment shows, commercials, and other programming types. This classification enables automated content monitoring, ad detection, and intelligent media archiving by identifying the type of TV content being shown.

Example image that needs classification.
Define a custom schema for image classification
In the sections below, we’ll showcase how to use the API for image classification.vlm-1
can automatically classify images based on their content and visual characteristics, providing both a classification and a rationale for its decision. First, let’s create a custom schema that will be used to classify the images.
Classify images
Once you have defined your custom schema, you can usevlm-1
to classify images according to this schema. The classification will be validated against the schema you defined, ensuring that it conforms to the expected structure and types. First, let’s look at an example of how to classify a single image.
Sample Image Classification
Let’s take a look at the sample output for a typical animal image.rationale
: A detailed explanation of why it classified the image as a news, based on visual features and content. This allows the developer or user to introspect on the classification and make any necessary adjustments downstream to the model.image_type
: The correct image classification type, in this casenews
.confidence
: A qualitative confidence level of “high”, indicating strong certainty in the classification based on the clear presence of financial market data and a news presenter.
Fine-tuning Image Classification
This feature is currently only available for our enterprise-tier customers. If you are interested in using this feature, please contact us.