.domain-column {
    width: 240px;
    display: inline-block;
  }

.function-column {
  width: 240px;
  display: inline-block;
}

.allowed-inputs-column {
  width: 160px;
  display: inline-block;
}

.description-column {
  width: 100%;
  display: inline-block;
}

.tag {
  padding: 2px 8px;
  border-radius: 12px;
  font-size: 0.9em;
  font-weight: 500;
  display: inline-block;
  margin-right: 8px;
  margin-bottom: 4px;
}

.tag-document {
  background: #FFF3E0;
  color: #E65100;
}

.tag-image {
  background: #E6FFDD;
  color: #438F2A;
}

.tag-video {
  background: #EAE0FF;
  color: #784BCE;
}

.tag-audio {
  background: #DEE9FF;
  color: #4478D7;
}

.tag-url {
  background: #E3F2FD;
  color: #1976D2;
}

.tag-ref {
  background: #F3E5F5;
  color: #7B1FA2;
}




.card {
  background-color: #f9f9f9;
}

.card-demo {
    color: #4478D7;
    background-color: #dae6ff;
}


Generate structured prediction for the given document.

Doc -> JSON

Credits Used

The type of element processed (e.g. image, page, video, audio).

Element Type

Elements Processed

Message

Number of steps processed, in case of agentic execution.

Steps

CreditUsageResponse

Request configuration for image/document/video generation.

Include confidence scores in the response (included in the `_metadata` field).

Confidence

The detail level to use for processing multimodal data.

Detail

The GraphQL statement to use for the application. If provided, the response model will be generated from the GraphQL statement.

Gql Stmt

Include grounding in the response (included in the `_metadata` field).

Grounding

The overridden JSON schema to use for the model. To be used instead of the response model.

Json Schema

Include keyframes in the video transcription response.

Keyframes

The maximum number of retries to use for the application.

Max Retries

The maximum number of tokens to use for the application.

Max Tokens

Prompt to use for the application (currently this is ignored).

Prompt

The temperature to use for the application.

Temperature

GenerationConfig

Metadata for the request.

Typically captured in {"vlmrun": {"metadata": {"environment": <environment>, ...}}.

Whether the file can be used for training

Allow Training

The environment where the request was made.

Environment

Session Id

RequestMetadata

Location

Error Type

ValidationError

Request to the VLM API using a document (doc, docx, pptx, pdf).

DocumentFilePredictionRequest

HTTPBearer

Base prediction response for all API responses.

PredictionResponse

HTTPValidationError

Document Generate

VLM Run

Extract JSON from images, videos, and documents with a unified API.

Introduction

Extract JSON from images, videos, and documents with type-safety.

Structured Responses

Pre-built schemas and domain definitions for common data extraction tasks.

Supported Domains

Define custom schemas for visual extraction purposes.

Custom Schemas

Ground extracted data with location (bounding box) coordinates and confidence scores.

Visual Grounding

Ground extracted data with start/end times for audio/video segments and speaker identification.

Temporal Grounding

Support for long-output contexts for domains like audio/video transcription, exceeding 8K token limits.

Long-context Outputs

Query a subset of schema fields to improve efficiency for querying and document ETL.

GraphQL

Extract structured data from long documents and reports.

Parsing Documents

Learn how to classify documents into categories like invoices, bank statements, and utility bills.

Classifying Documents

Learn how to classify images into categories like animals, landscapes, and objects using AI.

Classifying Images

Learn how to generate captions, tags and descriptions for images.

Cataloging Images

Learn how to transcribe and analyze hours-long video content using our Video Transcription API.

Transcribing Video

Learn how to transcribe and analyze long-form audio.

Transcribing Audio

Improve model performance through feedback collection and fine-tuning.

Providing Feedback

Rate limits to consider when using the API.

Rate Limits

List of error codes that you may encounter when using the API

Error Codes

Changelog

Teach your AI agents to See, Act and Automate – with VLM Run MCP

VLM Run MCP Server

Connect to the remote VLM Run MCP Server and start building agentic workflows with visual AI in minutes.

MCP Quickstart

Complete reference of all available VLM Run MCP tools for visual AI processing.

MCP Tools Reference

Health

Get models

List domains

Get the JSON schema for a given domain (document.invoice, document.receipt, etc).

Get schema for domain

Generate structured prediction for the given image.

Image -> JSON

Generate structured prediction for the given document using a custom agent.

Doc Agent -> JSON

Generate structured prediction for the given audio file.

Audio -> JSON

Generate structured prediction for the given video file.

Video -> JSON

Get Predictions by ID

Get all predictions uploaded by the user with pagination.

Get Predictions

Upload File

Get File by ID

Get all files uploaded by the user with pagination.

List Files

Create chat completion

List all models

Submit feedback for a prediction by its ID.

Submit Feedback

Get all feedbacks for a specific request by its ID.

Get Feedback

How to get started with the VLM Run Python SDK

Getting Started

Core concepts and components of the VLM Run Python SDK

SDK Overview

Detailed guide to the VLM Run Python SDK client

Client Reference

client.files

client.models

client.hub

client.image

client.document

client.audio

client.video

Manage predictions with the VLM Run Python SDK

client.predictions

Complete guide to the VLM Run Command Line Interface

CLI Reference

Learn how to install and use the VLM Run Node.js SDK

VLM Run Node.js SDK Client Configuration and Usage

client

Learn how to upload and manage files with the VLM Run Node.js SDK

Learn how to work with models in the VLM Run Node.js SDK

Learn how to process images with the VLM Run Node.js SDK

Learn how to process documents with the VLM Run Node.js SDK

Learn how to process audio files with the VLM Run Node.js SDK

Extract JSON from images, videos, and documents with custom schemas.

Run VLM-1 with the OpenAI Python SDK with just 2 lines of code change.

OpenAI Compatibility

Run `vlm-1` with the Instructor Python SDK with minimal code changes.

Instructor Compatibility

The VLM Run Hub is a collection of pre-defined domains and schemas for structured data extraction.

Models

Hub

Generate

Predictions

Files

OpenAI Compatibility

Feedback

Doc -> JSON

Authorizations

Body

Response