Ways to Use VLM Run

VLM Run exposes four distinct entry points into our models, each tuned for a different kind of workload, from single-shot structured extraction to fully agentic, multi-step pipelines to interactive chat. This doc walks through each method, when to reach for it, and how they compare.

1. Requests

Model: vlm-1 Requests are the simplest way to use VLM Run: provide a single file and a domain, and get structured JSON back. They’re designed for ETL-style workloads where you have a fixed prompt (the domain) and want flexibility on the schema.

Input: a single document, image, audio file, or video
Output: JSON
Execution: batch. Submit a request and poll for the prediction by ID
Best for: single-step extraction at scale (invoices, receipts, IDs, medical forms, etc.)

from pathlib import Path
from vlmrun.client import VLMRun

client = VLMRun(api_key="<VLMRUN_API_KEY>")
response = client.document.generate(
    file=Path("invoice.pdf"),
    model="vlm-1",
    domain="document.invoice",
)

Think of Requests as the “one file in, one JSON out” primitive. Fixed prompt, flexible schema, no orchestration.

2. Executions

Model: vlmrun-orion-1 Executions are for agentic, multi-step workloads. Where a Request is one model call against one file, an Execution runs an agent that can classify, extract, redact, transform, and combine outputs across multiple files, all orchestrated through a skill.

Input: multiple documents, images, and/or videos
Output: JSON
Execution: batch. Submit an execution and poll for the result by ID
Configured via: a skill, defined primarily by a SKILL.md file, with optional reference files, schemas, and examples
Best for: anything open-ended or multi-step (document packages, cross-file reasoning, redaction pipelines, classification-then-extraction flows)

Skills are the unit of configuration here: SKILL.md gives the agent its instructions, and supporting files (schemas, examples, reference docs) ground its behavior. This is the most powerful and flexible surface we offer.

3. Chat Completions API

Model: vlmrun-orion-1 The Chat Completions API is a drop-in replacement for the OpenAI Chat Completions API. Point the OpenAI SDK at our base URL and you’re using Orion with full visual-tool and artifact support, with no other code changes required.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.vlm.run/v1/openai",
    api_key="<VLMRUN_API_KEY>",
)

Input: multiple files via standard chat messages (text + image/file parts)
Output: Text, JSON
Execution: both streaming and non-streaming
Logging: programmatic calls are logged in the chat completions table
Best for: interactive or conversational multimodal use cases, and any app already built against the OpenAI SDK

This is the lowest-friction path for teams already using the OpenAI Chat Completions API: you get Orion’s capabilities with the API shape your code already speaks.

4. Chat UI

Model: vlmrun-orion-1 The Chat UI (chat.vlm.run) is our hosted chat interface. It’s powered by the same Chat Completions API under the hood, giving you Orion with visual tools and artifacts in a browser, with no code required.

Input: files and messages via the web UI
Output: Text, JSON (artifacts rendered in the browser)
Logging: Chat UI sessions are not logged in the chat completions table (only programmatic API calls are)
Best for: exploration, demos, one-off tasks, and iterating on prompts before writing code

Summary Table

	Requests	Executions	Chat Completions	Chat UI
Model	`vlm-1`	`vlmrun-orion-1`	`vlmrun-orion-1`	`vlmrun-orion-1`
Input	Single file (doc, image, audio, or video)	Multiple files	Multiple files	Multiple files
Output	JSON	JSON	Text, JSON	Text, JSON
Mode	Batch	Batch	Streaming + non-streaming	Streaming
Prompt model	Fixed prompt, flexible schema	Open-ended; defined by `SKILL.md` + reference files	Free-form messages	Free-form messages
Visual tool calling	❌	✅	✅	✅
Artifacts	❌	✅	✅	✅
Visual grounding (bounding boxes)	✅	❌	❌	❌
Workload shape	Single-step ETL	Multi-step / agentic	Conversational / multimodal	Conversational / multimodal
Best for	High-volume structured extraction	Complex multi-file reasoning and actions	Apps built on the OpenAI chat completions API	Playground for exploration and demos
Example usage	Known document types with fixed output schemas (invoices, receipts, IDs)	Custom multi-file pipelines (classify, extract, redact across a document package)	Multimodal chatbots and OpenAI-SDK apps using Orion	Exploring a new domain or iterating on a skill quickly

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Fine-tuning

Misc

Ways to Use VLM Run

1. Requests

2. Executions

3. Chat Completions API

4. Chat UI

Summary Table

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Fine-tuning

Misc

Documentation Index

​1. Requests

​2. Executions

​3. Chat Completions API

​4. Chat UI

​Summary Table

1. Requests

2. Executions

3. Chat Completions API

4. Chat UI

Summary Table