SDK Overview
The VLM Run SDK enables you to extract structured data from unstructured content using VLMs. Whether you’re processing invoices, analyzing images, transcribing audio, or extracting insights from video, the SDK provides a unified interface to transform raw media into actionable business data.
Core Concepts
Domains & Schemas
In VLM Run, domains represent different types of content analysis:
document.invoice
- Extract data from invoices
image.caption
- Extract caption from the image
audio.transcription
- Transcribe spoken content
video.dashcam-analytics
- Analyze dashcam footage
Each domain has an associated schema that defines the structured output format.
Content Processing Flow
The typical flow for processing content follows these steps:
- Prepare content - File, URL, or in-memory data
- Choose domain - Select appropriate domain for your task
- Generate prediction - Process the content
- Handle results - Work with the structured response
SDK Structure
The SDK is organized around a central VLMRun
client that gives you access to all functionality:
VLMRun Client
│
├── Content APIs
│ ├── client.image # Image processing
│ ├── client.document # Document processing
│ ├── client.audio # Audio processing
│ └── client.video # Video processing
│
├── Resource APIs
│ ├── client.files # File management
│ ├── client.hub # Domain & schema access
│ └── client.models # Model information
│
└── Utility APIs
├── client.predictions # Prediction management
└── client.fine_tuning # Model customization
Each media type has its own specialized client with consistent patterns.
Images
# Process an image from a file
response = client.image.generate(
images=[image],
domain="document.invoice"
)
# From a URL
response = client.image.generate(
urls=["https://example.com/invoice.jpg"],
domain="document.invoice"
)
Documents
# Process a document
response = client.document.generate(
file="document.pdf",
domain="document.form"
)
Audio
# Process audio
response = client.audio.generate(
file="recording.mp3",
domain="audio.transcription"
)
Video
# Process video
response = client.video.generate(
file="clip.mp4",
domain="video.dashcam-analytics"
)
Working with Predictions
All content processing methods return a PredictionResponse
with a consistent structure:
# Key fields of a prediction response
prediction = client.image.generate(...)
prediction.id # Unique identifier
prediction.status # Processing status
prediction.created_at # Creation timestamp
prediction.response # Structured results (when complete)
prediction.usage # Resource usage information
Prediction Statuses
A prediction will have one of these statuses:
enqueued
- Waiting to be processed
pending
- Ready to start processing
running
- Currently being processed
completed
- Processing finished successfully
failed
- Processing encountered an error
Handling Async Processing
For content that takes time to process, you can wait for completion:
# Start processing
prediction = client.document.generate(
file="large-document.pdf",
domain="document.form"
)
# Wait for processing to complete
if prediction.status != "completed":
prediction = client.predictions.wait(prediction.id)
# Now work with the results
result = prediction.response
Using Schemas
Schemas define the structure of prediction responses, providing type-safe access to extracted data.
Working with Standard Schemas
Every domain has a predefined schema:
# Get structured data from a domain schema
response = client.image.generate(
images=[image],
domain="document.invoice"
)
# Access fields in the response
invoice_data = response.response
print(f"Invoice #: {invoice_data.invoice_number}")
print(f"Amount: ${invoice_data.total_amount}")
Using Custom Schemas
You can define your own schema for custom extraction:
from pydantic import BaseModel, Field
# Define a custom schema
class ProductInfo(BaseModel):
name: str = Field(..., description="Product name")
price: float = Field(..., description="Product price")
category: str = Field(..., description="Product category")
# Use the custom schema
response = client.image.generate(
images=[product_image],
domain="image.product",
config={"json_schema": ProductInfo.model_json_schema()}
)
Key Resources
Files
Manage files for processing:
# Upload a file
file = client.files.upload("document.pdf")
# Use the file in a prediction
prediction = client.document.generate(
urls=[file.url],
domain="document.form"
)
Hub
Access domains and schemas:
# List available domains
domains = client.hub.list_domains()
# Get details about a domain
schema = client.hub.get_schema("document.invoice")
Models
Get information about available models:
# List available models
models = client.models.list()
Common Patterns
The most common pattern is processing content and extracting structured data:
# Process and extract in one step
response = client.image.generate(
images=[image],
domain="document.invoice",
autocast=True # Get a type-safe model
)
# Work with the structured data
invoice = response.response
total_with_tax = invoice.total_amount * 1.1
Upload & Process
Another common pattern is uploading files first, then processing them:
# 1. Upload file
file = client.files.upload("invoice.jpg")
# 2. Process the file
prediction = client.image.generate(
urls=[file.url],
domain="document.invoice"
)
# 3. Get the results
if prediction.status == "completed":
invoice_data = prediction.response
Batch Processing
For processing multiple files:
# Process multiple files
results = []
for file_path in file_paths:
response = client.document.generate(
file=file_path,
domain="document.form",
batch=True # Process asynchronously
)
results.append(response.id)
# Wait for all results
completed = [client.predictions.wait(id) for id in results]
Next Steps
Now that you understand the core concepts, you can: