.domain-column {
    width: 240px;
    display: inline-block;
  }

.function-column {
  width: 240px;
  display: inline-block;
}

.allowed-inputs-column {
  width: 160px;
  display: inline-block;
}

.description-column {
  width: 100%;
  display: inline-block;
}

.tag {
  padding: 2px 8px;
  border-radius: 12px;
  font-size: 0.9em;
  font-weight: 500;
  display: inline-block;
  margin-right: 8px;
  margin-bottom: 4px;
}

.tag-document {
  background: #FFF3E0;
  color: #E65100;
}

.tag-image {
  background: #E6FFDD;
  color: #438F2A;
}

.tag-video {
  background: #EAE0FF;
  color: #784BCE;
}

.tag-audio {
  background: #DEE9FF;
  color: #4478D7;
}

.tag-url {
  background: #E3F2FD;
  color: #1976D2;
}

.tag-ref {
  background: #F3E5F5;
  color: #7B1FA2;
}




.card {
  background-color: #f9f9f9;
}

.card-demo {
    color: #4478D7;
    background-color: #dae6ff;
}


What is VLM Run?

What makes VLM Run unique?

Let’s get started!

Extract JSON from images, videos, and documents with a unified API.

Introduction

VLM Run

Extract JSON from images, videos, and documents with type-safety.

Structured Responses

Pre-built schemas and domain definitions for common data extraction tasks.

Supported Domains

Define custom schemas for visual extraction purposes.

Custom Schemas

Ground extracted data with location (bounding box) coordinates and confidence scores.

Visual Grounding

Ground extracted data with start/end times for audio/video segments and speaker identification.

Temporal Grounding

Support for long-output contexts for domains like audio/video transcription, exceeding 8K token limits.

Long-context Outputs

Query a subset of schema fields to improve efficiency for querying and document ETL.

GraphQL

Extract structured data from long documents and reports.

Parsing Documents

Learn how to classify documents into categories like invoices, bank statements, and utility bills.

Classifying Documents

Learn how to classify images into categories like animals, landscapes, and objects using AI.

Classifying Images

Learn how to generate captions, tags and descriptions for images.

Cataloging Images

Learn how to transcribe and analyze hours-long video content using our Video Transcription API.

Transcribing Video

Learn how to transcribe and analyze long-form audio.

Transcribing Audio

Improve model performance through feedback collection and fine-tuning.

Providing Feedback

Flexible pricing plans for developers and enterprises to build with VLM Run.

Pricing

Rate limits to consider when using the API.

Rate Limits

List of error codes that you may encounter when using the API

Error Codes

Changelog

Teach your AI agents to See, Act and Automate – with VLM Run MCP

VLM Run MCP Server

Connect to the remote VLM Run MCP Server and start building agentic workflows with visual AI in minutes.

MCP Quickstart

Complete reference of all available VLM Run MCP tools for visual AI processing.

MCP Tools Reference

Health

Get models

List domains

Get the JSON schema for a given domain (document.invoice, document.receipt, etc).

Get schema for domain

Generate structured prediction for the given image.

Image -> JSON

Generate structured prediction for the given document.

Doc -> JSON

Generate structured prediction for the given document using a custom agent.

Doc Agent -> JSON

Generate structured prediction for the given audio file.

Audio -> JSON

Generate structured prediction for the given video file.

Video -> JSON

Get Predictions by ID

Get all predictions uploaded by the user with pagination.

Get Predictions

Upload File

Get File by ID

Get all files uploaded by the user with pagination.

List Files

Create chat completion

List all models

Submit feedback for a prediction by its ID.

Submit Feedback

Get all feedbacks for a specific request by its ID.

Get Feedback

How to get started with the VLM Run Python SDK

Getting Started

Core concepts and components of the VLM Run Python SDK

SDK Overview

Detailed guide to the VLM Run Python SDK client

Client Reference

client.files

client.models

client.hub

client.image

client.document

client.audio

client.video

Manage predictions with the VLM Run Python SDK

client.predictions

Complete guide to the VLM Run Command Line Interface

CLI Reference

Learn how to install and use the VLM Run Node.js SDK

VLM Run Node.js SDK Client Configuration and Usage

client

Learn how to upload and manage files with the VLM Run Node.js SDK

Learn how to work with models in the VLM Run Node.js SDK

Learn how to process images with the VLM Run Node.js SDK

Learn how to process documents with the VLM Run Node.js SDK

Learn how to process audio files with the VLM Run Node.js SDK

Extract JSON from images, videos, and documents with custom schemas.

Run VLM-1 with the OpenAI Python SDK with just 2 lines of code change.

OpenAI Compatibility

Run `vlm-1` with the Instructor Python SDK with minimal code changes.

Instructor Compatibility

The VLM Run Hub is a collection of pre-defined domains and schemas for structured data extraction.

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

Introduction

What is VLM Run?

What makes VLM Run unique?

Structured Ouptuts

Fine-tuning

Scalable

Private Deployments

Let’s get started!

Sign Up

API Reference

Cookbooks

Book a Demo

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

​What is VLM Run?

​What makes VLM Run unique?

Structured Ouptuts

Fine-tuning

Scalable

Private Deployments

​Let’s get started!

Sign Up

API Reference

Cookbooks

Book a Demo

What is VLM Run?

What makes VLM Run unique?

Let’s get started!