The client.image object allows you to process images and extract structured data.

Generate Predictions

from PIL import Image

from vlmrun.client import VLMRun
from vlmrun.hub.schemas.document.invoice import Invoice
from vlmrun.client.types import PredictionResponse, GenerationConfig

# Initialize the client
client = VLMRun()

# Process an image with a predefined schema
image: Image.Image = Image.open("path/to/image.jpg")
response = client.image.generate(
    images=[image],
    domain="document.invoice",
)

# Process with custom schema
image: Image.Image = Image.open("path/to/image.jpg")
response: PredictionResponse = client.image.generate(
    images=[image],
    domain="document.invoice",
    config=GenerationConfig(
        json_schema={...}
    )
)
print(response)

Generate Predictions with a custom schema

Let’s say we want to classify images into one of three categories: tv, document, or other. You can define a custom schema as follows, and pass it to the json_schema parameter:

from typing import Literal
from pydantic import BaseModel, Field
from vlmrun.client.types import GenerationConfig


class ImagePrediction(BaseModel):
    label: Literal["tv", "document", "other"] = Field(..., title="Class label for the image.")
    caption: str = Field(..., title="Caption for the image.")

# Initialize the client
client = VLMRun()

# Load the image, and process it with the custom schema
image: Image.Image = Image.open("path/to/image.jpg")
response: PredictionResponse = client.image.generate(
    images=[image],
    domain="image.classification",
    config=GenerationConfig(
        json_schema=ImagePrediction.model_json_schema()
    )
)

Get Usage

from vlmrun.client.types import CreditUsage

usage: CreditUsage = response.usage
print(usage)

Image Utilities

The VLM Run SDK provides several image-processing utilities for encoding and downloading images.

from vlmrun.common.image import encode_image
from vlmrun.common.utils import download_image
from PIL import Image

# Convert image to base64
image = Image.open("image.jpg")
base64_str = encode_image(image, format="PNG")

# Download image from URL
image: Image.Image = download_image("https://example.com/image.jpg")