Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vlm.run/llms.txt

Use this file to discover all available pages before exploring further.

VLM Run exposes four distinct entry points into our models, each tuned for a different kind of workload, from single-shot structured extraction to fully agentic, multi-step pipelines to interactive chat. This doc walks through each method, when to reach for it, and how they compare.

1. Requests

Model: vlm-1 Requests are the simplest way to use VLM Run: provide a single file and a domain, and get structured JSON back. They’re designed for ETL-style workloads where you have a fixed prompt (the domain) and want flexibility on the schema.
  • Input: a single document, image, audio file, or video
  • Output: JSON
  • Execution: batch. Submit a request and poll for the prediction by ID
  • Best for: single-step extraction at scale (invoices, receipts, IDs, medical forms, etc.)
from pathlib import Path
from vlmrun.client import VLMRun

client = VLMRun(api_key="<VLMRUN_API_KEY>")
response = client.document.generate(
    file=Path("invoice.pdf"),
    model="vlm-1",
    domain="document.invoice",
)
Think of Requests as the “one file in, one JSON out” primitive. Fixed prompt, flexible schema, no orchestration.

2. Executions

Model: vlmrun-orion-1 Executions are for agentic, multi-step workloads. Where a Request is one model call against one file, an Execution runs an agent that can classify, extract, redact, transform, and combine outputs across multiple files, all orchestrated through a skill.
  • Input: multiple documents, images, and/or videos
  • Output: JSON
  • Execution: batch. Submit an execution and poll for the result by ID
  • Configured via: a skill, defined primarily by a SKILL.md file, with optional reference files, schemas, and examples
  • Best for: anything open-ended or multi-step (document packages, cross-file reasoning, redaction pipelines, classification-then-extraction flows)
Skills are the unit of configuration here: SKILL.md gives the agent its instructions, and supporting files (schemas, examples, reference docs) ground its behavior. This is the most powerful and flexible surface we offer.

3. Chat Completions API

Model: vlmrun-orion-1 The Chat Completions API is a drop-in replacement for the OpenAI Chat Completions API. Point the OpenAI SDK at our base URL and you’re using Orion with full visual-tool and artifact support, with no other code changes required.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.vlm.run/v1/openai",
    api_key="<VLMRUN_API_KEY>",
)
  • Input: multiple files via standard chat messages (text + image/file parts)
  • Output: Text, JSON
  • Execution: both streaming and non-streaming
  • Logging: programmatic calls are logged in the chat completions table
  • Best for: interactive or conversational multimodal use cases, and any app already built against the OpenAI SDK
This is the lowest-friction path for teams already using the OpenAI Chat Completions API: you get Orion’s capabilities with the API shape your code already speaks.

4. Chat UI

Model: vlmrun-orion-1 The Chat UI (chat.vlm.run) is our hosted chat interface. It’s powered by the same Chat Completions API under the hood, giving you Orion with visual tools and artifacts in a browser, with no code required.
  • Input: files and messages via the web UI
  • Output: Text, JSON (artifacts rendered in the browser)
  • Logging: Chat UI sessions are not logged in the chat completions table (only programmatic API calls are)
  • Best for: exploration, demos, one-off tasks, and iterating on prompts before writing code

Summary Table

RequestsExecutionsChat CompletionsChat UI
Modelvlm-1vlmrun-orion-1vlmrun-orion-1vlmrun-orion-1
InputSingle file (doc, image, audio, or video)Multiple filesMultiple filesMultiple files
OutputJSONJSONText, JSONText, JSON
ModeBatchBatchStreaming + non-streamingStreaming
Prompt modelFixed prompt, flexible schemaOpen-ended; defined by SKILL.md + reference filesFree-form messagesFree-form messages
Visual tool calling
Artifacts
Visual grounding (bounding boxes)
Workload shapeSingle-step ETLMulti-step / agenticConversational / multimodalConversational / multimodal
Best forHigh-volume structured extractionComplex multi-file reasoning and actionsApps built on the OpenAI chat completions APIPlayground for exploration and demos
Example usageKnown document types with fixed output schemas (invoices, receipts, IDs)Custom multi-file pipelines (classify, extract, redact across a document package)Multimodal chatbots and OpenAI-SDK apps using OrionExploring a new domain or iterating on a skill quickly