> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vlm.run/llms.txt
> Use this file to discover all available pages before exploring further.

# SDK Overview

> Core concepts and components of the VLM Run Python SDK

# SDK Overview

The VLM Run SDK enables you to extract structured data from unstructured content using VLMs. Whether you're processing invoices, analyzing images, transcribing audio, or extracting insights from video, the SDK provides a unified interface to transform raw media into actionable business data.

## Core Concepts

### Domains & Schemas

In VLM Run, **domains** represent different types of content analysis:

* `document.invoice` - Extract data from invoices
* `image.caption` - Extract caption from the image
* `audio.transcription` - Transcribe spoken content
* `video.dashcam-analytics` - Analyze dashcam footage

Each domain has an associated **schema** that defines the structured output format.

### Content Processing Flow

The typical flow for processing content follows these steps:

1. **Prepare content** - File, URL, or in-memory data
2. **Choose domain** - Select appropriate domain for your task
3. **Generate prediction** - Process the content
4. **Handle results** - Work with the structured response

## SDK Structure

The SDK is organized around a central `VLMRun` client that gives you access to all functionality:

```
VLMRun Client
│
├── Content APIs
│   ├── client.image     # Image processing
│   ├── client.document  # Document processing
│   ├── client.audio     # Audio processing
│   └── client.video     # Video processing
│
├── Resource APIs
│   ├── client.files     # File management
│   ├── client.hub       # Domain & schema access
│   └── client.models    # Model information
│
└── Utility APIs
    ├── client.predictions # Prediction management
    └── client.fine_tuning # Model customization
```

## Working with Media Types

Each media type has its own specialized client with consistent patterns.

### Images

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Process an image from a file
response = client.image.generate(
    images=[image],
    domain="document.invoice"
)

# From a URL
response = client.image.generate(
    urls=["https://example.com/invoice.jpg"],
    domain="document.invoice"
)
```

### Documents

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Process a document
response = client.document.generate(
    file="document.pdf",
    domain="document.invoice"
)
```

### Audio

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Process audio
response = client.audio.generate(
    file="recording.mp3",
    domain="audio.transcription"
)
```

### Video

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Process video
response = client.video.generate(
    file="clip.mp4",
    domain="video.dashcam-analytics"
)
```

## Working with Predictions

All content processing methods return a `PredictionResponse` with a consistent structure:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Key fields of a prediction response
prediction = client.image.generate(...)

prediction.id          # Unique identifier
prediction.status      # Processing status
prediction.created_at  # Creation timestamp
prediction.response    # Structured results (when complete)
prediction.usage       # Resource usage information
```

### Prediction Statuses

A prediction will have one of these statuses:

* `enqueued` - Waiting to be processed
* `pending` - Ready to start processing
* `running` - Currently being processed
* `completed` - Processing finished successfully
* `failed` - Processing encountered an error

### Handling Async Processing

For content that takes time to process, you can wait for completion:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Start processing
prediction = client.document.generate(
    file="large-document.pdf",
    domain="document.invoice"
)

# Wait for processing to complete
if prediction.status != "completed":
    prediction = client.predictions.wait(prediction.id)

# Now work with the results
result = prediction.response
```

## Using Schemas

Schemas define the structure of prediction responses, providing type-safe access to extracted data.

### Working with Standard Schemas

Every domain has a predefined schema:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Get structured data from a domain schema
response = client.image.generate(
    images=[image],
    domain="document.invoice"
)

# Access fields in the response
invoice_data = response.response
print(f"Invoice #: {invoice_data.invoice_number}")
print(f"Amount: ${invoice_data.total_amount}")
```

### Using Custom Schemas

You can define your own schema for custom extraction:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
from pydantic import BaseModel, Field

# Define a custom schema
class ProductInfo(BaseModel):
    name: str = Field(..., description="Product name")
    price: float = Field(..., description="Product price")
    category: str = Field(..., description="Product category")

# Use the custom schema
response = client.image.generate(
    images=[product_image],
    domain="image.product",
    config={"json_schema": ProductInfo.model_json_schema()}
)
```

## Key Resources

### Files

Manage files for processing:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Upload a file
file = client.files.upload("document.pdf")

# Use the file in a prediction
prediction = client.document.generate(
    urls=[file.url],
    domain="document.invoice"
)
```

### Hub

Access domains and schemas:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# List available domains
domains = client.hub.list_domains()

# Get details about a domain
schema = client.hub.get_schema("document.invoice")
```

### Models

Get information about available models:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# List available models
models = client.models.list()
```

## Common Patterns

### Process & Extract

The most common pattern is processing content and extracting structured data:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Process and extract in one step
response = client.image.generate(
    images=[image],
    domain="document.invoice",
    autocast=True  # Get a type-safe model
)

# Work with the structured data
invoice = response.response
total_with_tax = invoice.total_amount * 1.1
```

### Upload & Process

Another common pattern is uploading files first, then processing them:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# 1. Upload file
file = client.files.upload("invoice.jpg")

# 2. Process the file
prediction = client.image.generate(
    urls=[file.url],
    domain="document.invoice"
)

# 3. Get the results
if prediction.status == "completed":
    invoice_data = prediction.response
```

### Batch Processing

For processing multiple files:

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Process multiple files
results = []
for file_path in file_paths:
    response = client.document.generate(
        file=file_path,
        domain="document.invoice",
        batch=True  # Process asynchronously
    )
    results.append(response.id)

# Wait for all results
completed = [client.predictions.wait(id) for id in results]
```

## Next Steps

Now that you understand the core concepts, you can:

* Explore the [Client Reference](/sdk-reference/components/client) for detailed API documentation
* Try the specialized APIs for [Image](/sdk-reference/predictions/image), [Document](/sdk-reference/predictions/document), [Audio](/sdk-reference/predictions/audio), or [Video](/sdk-reference/predictions/video)
* Learn about the [CLI](/sdk-reference/cli) for command-line usage
