# VLM Run

> Documentation for VLM Run, the unified gateway for visual intelligence. Understand, reason over, and act on images, video, and documents with a single API.

## Docs

- [FAQ](https://docs.vlm.run/FAQ.md): Frequently Asked Questions.
- [Multi-modal Artifacts](https://docs.vlm.run/agents/artifacts.md): Retrieve generated images, videos, audio, and documents from agent responses
- [Layout Detection](https://docs.vlm.run/agents/capabilities/document/layout-understanding.md): Identify and analyze document structure with visual result previews showing highlighted extractions and field overlays
- [Multi-Page Analysis](https://docs.vlm.run/agents/capabilities/document/multi-page-analysis.md): Process and analyze documents across multiple pages with context preservation and cross-document correlation
- [Visual Grounding](https://docs.vlm.run/agents/capabilities/document/visual-grounding.md): Connect text elements with their visual locations in documents for precise content understanding
- [Caption & Tag](https://docs.vlm.run/agents/capabilities/image/captioning.md): Generate detailed captions and tags for images using advanced vision models.
- [Detection](https://docs.vlm.run/agents/capabilities/image/detection.md): Detect and locate objects, faces or people in images with bounding boxes and confidence scores
- [Generate & Edit](https://docs.vlm.run/agents/capabilities/image/generation.md): Generate and edit images from text prompts, sketches, or existing images with creative control
- [Pointing](https://docs.vlm.run/agents/capabilities/image/pointing.md): Detect and predict key anatomical points and structural features in images with sub-pixel accuracy
- [Segmentation](https://docs.vlm.run/agents/capabilities/image/segmentation.md): Create precise pixel-level segmentation masks for objects, regions, and features in images
- [Image Tools](https://docs.vlm.run/agents/capabilities/image/tools.md): Image tools for cropping, rotating, enhancing, and transforming images
- [UI Parsing](https://docs.vlm.run/agents/capabilities/image/ui-parsing.md): Analyze and understand user interface elements in screenshots and application images
- [Caption & Tag](https://docs.vlm.run/agents/capabilities/video/captioning.md): Generate detailed captions and tags for videos using advanced vision models.
- [Generate & Edit](https://docs.vlm.run/agents/capabilities/video/generation.md): Create and edit videos with AI-powered tools for content creation and manipulation
- [Video Tools](https://docs.vlm.run/agents/capabilities/video/tools.md): Video tools for trimming, sampling, and extracting segments from videos
- [Chat with Orion](https://docs.vlm.run/agents/chat.md): Bridging computer-vision tools to AI agents through language.
- [Multi-modal Inputs](https://docs.vlm.run/agents/inputs.md): Encode images, videos, documents, and other media in a consistent format for agent execution and chat completions
- [Instructor Compatibility](https://docs.vlm.run/agents/integrations/integrations-instructor.md): Run VLM Run Agents with the Instructor Python SDK with minimal code changes.
- [OpenAI Compatibility](https://docs.vlm.run/agents/integrations/integrations-openai-compatibility.md): Run VLM Run Agents with the OpenAI Python SDK with just 2 lines of code change.
- [Introduction](https://docs.vlm.run/agents/introduction.md): Introducing VLM Run Orion – the first visual agent that sees, reasons, and acts.
- [Pricing](https://docs.vlm.run/agents/pricing.md): Credit-based pricing for Orion agents.
- [Structured Responses](https://docs.vlm.run/agents/structured-responses.md): Agents that reliably return JSON via chat completions – with schema validation.
- [Overview](https://docs.vlm.run/api-reference/index.md)
- [Delete File](https://docs.vlm.run/api-reference/v1/files/delete-file.md): Delete a file by ID. Only available for Pro and Enterprise users.
- [Get File by ID](https://docs.vlm.run/api-reference/v1/files/get-files-by-id.md): Get a file by ID.
- [List Files](https://docs.vlm.run/api-reference/v1/files/get-files-list.md): Get all files uploaded by the user with pagination.
- [Upload File](https://docs.vlm.run/api-reference/v1/files/post-file-upload.md): Upload a file.
- [Get Artifact](https://docs.vlm.run/api-reference/v1/get-artifact-by-id.md): Retrieve an artifact by session ID or execution ID.
- [List models](https://docs.vlm.run/api-reference/v1/get-models.md): Get the list of supported models.
- [Health](https://docs.vlm.run/api-reference/v1/health.md): Health check endpoint.
- [List domains](https://docs.vlm.run/api-reference/v1/hub/get-domains.md): Get the list of supported domains.
- [Audio → JSON](https://docs.vlm.run/api-reference/v1/post-audio-generate.md): Generate structured prediction for the given audio file.
- [Doc → JSON](https://docs.vlm.run/api-reference/v1/post-document-generate.md): Generate structured prediction for the given document.
- [Image → JSON](https://docs.vlm.run/api-reference/v1/post-image-generate.md): Generate structured prediction for the given image.
- [Get Schema](https://docs.vlm.run/api-reference/v1/post-schema.md)
- [Submit Feedback](https://docs.vlm.run/api-reference/v1/post-submit-feedback.md): Submit feedback for a request, execution, or chat by its ID.
- [Video → JSON](https://docs.vlm.run/api-reference/v1/post-video-generate.md): Generate structured prediction for the given video file.
- [Get Prediction by ID](https://docs.vlm.run/api-reference/v1/predictions/get-predictions-by-id.md): Get prediction JSON by request ID.
- [Get Predictions](https://docs.vlm.run/api-reference/v1/predictions/get-predictions-list.md): Get all predictions uploaded by the user with pagination.
- [Custom Schemas](https://docs.vlm.run/capabilities/custom-schemas.md): Define custom schemas for visual extraction purposes.
- [GraphQL](https://docs.vlm.run/capabilities/graphql.md): Query a subset of schema fields to improve efficiency for querying and document ETL.
- [Long-context Outputs](https://docs.vlm.run/capabilities/long-context-outputs.md): Support for long-output contexts for domains like audio/video transcription, exceeding 8K token limits.
- [Structured Responses](https://docs.vlm.run/capabilities/structured-responses.md): Extract JSON from images, videos, and documents with type-safety.
- [Temporal Grounding](https://docs.vlm.run/capabilities/temporal-grounding.md): Ground extracted data with start/end times for audio/video segments and speaker identification.
- [Visual Grounding](https://docs.vlm.run/capabilities/visual-grounding.md): Ground extracted data with location (bounding box) coordinates and confidence scores.
- [Changelog](https://docs.vlm.run/changelog.md): Changelog for VLM Run.
- [chat](https://docs.vlm.run/cli/chat.md): Chat with Orion to process images, videos, and documents
- [files](https://docs.vlm.run/cli/files.md): Upload, list, retrieve, and delete files
- [generate](https://docs.vlm.run/cli/generate.md): Generate structured predictions from images and documents
- [Getting Started](https://docs.vlm.run/cli/getting-started.md): Install and configure the VLM Run CLI
- [hub & models](https://docs.vlm.run/cli/hub.md): Browse domains, schemas, and available models
- [predictions](https://docs.vlm.run/cli/predictions.md): List and retrieve prediction results
- [skills](https://docs.vlm.run/cli/skills.md): Create, list, lookup, update, and download skills
- [Error Codes](https://docs.vlm.run/error-codes.md): List of error codes that you may encounter when using the API
- [Transcribing Audio](https://docs.vlm.run/guides/audio-ai/guide-audio-transcription.md): Learn how to transcribe and analyze long-form audio.
- [Classifying Documents](https://docs.vlm.run/guides/doc-ai/guide-classifying-documents.md): Learn how to classify documents into categories like invoices, bank statements, and utility bills.
- [Document Redaction & Edit](https://docs.vlm.run/guides/doc-ai/guide-document-redaction.md): Automatically detect and redact or replace sensitive information in documents with enterprise-grade compliance.
- [Parsing Intake Forms](https://docs.vlm.run/guides/doc-ai/guide-healthcare-parsing-intake-forms.md): Extract structured data from healthcare documents like patient referrals, intake forms, and insurance cards.
- [Parsing Documents](https://docs.vlm.run/guides/doc-ai/guide-parsing-documents.md): Extract structured data from long documents and reports.
- [Parsing Invoices](https://docs.vlm.run/guides/doc-ai/guide-parsing-invoices.md): Extract structured data from invoices.
- [Providing Feedback](https://docs.vlm.run/guides/feedback.md): Improve model performance through feedback collection and fine-tuning.
- [Cataloging Images](https://docs.vlm.run/guides/image-ai/guide-cataloging-images.md): Learn how to generate captions, tags and descriptions for images.
- [Classifying Images](https://docs.vlm.run/guides/image-ai/guide-classifying-images.md): Learn how to classify images into categories like animals, landscapes, and objects using AI.
- [Best Practices](https://docs.vlm.run/guides/schema/schema-best-practices.md): Best practices for designing schemas for visual inputs.
- [MarkdownPage](https://docs.vlm.run/guides/schema/schema-markdown-page.md): A visual guide to the MarkdownPage schema used for document extraction and processing.
- [Transcribing Video](https://docs.vlm.run/guides/video-ai/guide-video-transcription.md): Learn how to transcribe and analyze hours-long video content using our Video Transcription API.
- [Supported Domains](https://docs.vlm.run/hub.md): Pre-built schemas and domain definitions for common data extraction tasks.
- [MongoDB](https://docs.vlm.run/integrations/integrations-mongodb.md)
- [n8n](https://docs.vlm.run/integrations/integrations-n8n.md)
- [Voxel51 FiftyOne](https://docs.vlm.run/integrations/integrations-voxel51.md)
- [Zapier](https://docs.vlm.run/integrations/integrations-zapier.md)
- [Introduction](https://docs.vlm.run/introduction.md): Extract JSON from images, videos, and documents with a unified API.
- [Chat](https://docs.vlm.run/platform/chat.md): The interactive playground for chatting with Orion, VLM Run's visual agent
- [Completions](https://docs.vlm.run/platform/observe/completions.md): Review model completions, token usage, and response quality on the VLM Run platform
- [Evaluations](https://docs.vlm.run/platform/observe/evaluations.md): Measure and track the accuracy of your skills, agents, and request domains using feedback as ground truth.
- [Executions](https://docs.vlm.run/platform/observe/executions.md): Track agent and skill executions end to end on the VLM Run platform
- [Observe](https://docs.vlm.run/platform/observe/overview.md): Full observability for your visual AI: requests, executions, completions, and usage metrics
- [Requests](https://docs.vlm.run/platform/observe/requests.md): View, filter, and inspect every API request on the VLM Run platform
- [Platform](https://docs.vlm.run/platform/overview.md): The VLM Run platform: chat with visual agents, build skills, and observe every request in one place
- [Settings](https://docs.vlm.run/platform/settings.md): Manage API keys, team members, billing, and account preferences
- [Skills](https://docs.vlm.run/platform/skills/overview.md): Create, edit, and manage reusable visual extraction skills on the VLM Run platform
- [Pricing](https://docs.vlm.run/pricing.md): Flexible pricing plans for developers and enterprises to build with VLM Run.
- [Rate Limits](https://docs.vlm.run/rate-limits.md): Rate limits to consider when using the API.
- [client.agent](https://docs.vlm.run/sdk-reference/components/agent.md): Agent Chat Completions
- [Client Reference](https://docs.vlm.run/sdk-reference/components/client.md): Detailed guide to the VLM Run Python SDK client
- [client.files](https://docs.vlm.run/sdk-reference/components/files.md): Manage files with the VLM Run Python SDK
- [client.hub](https://docs.vlm.run/sdk-reference/components/hub.md): Hub API Reference
- [client.models](https://docs.vlm.run/sdk-reference/components/models.md): Models API Reference
- [SDK Overview](https://docs.vlm.run/sdk-reference/components/overview.md): Core concepts and components of the VLM Run Python SDK
- [client.predictions](https://docs.vlm.run/sdk-reference/components/predictions.md): Manage predictions with the VLM Run Python SDK
- [Getting Started](https://docs.vlm.run/sdk-reference/getting-started.md): How to get started with the VLM Run Python SDK
- [client.agent](https://docs.vlm.run/sdk-reference/node/components/agent.md): Learn how to use Agent Chat Completions with the VLM Run Node.js SDK
- [client](https://docs.vlm.run/sdk-reference/node/components/client.md): VLM Run Node.js SDK Client Configuration and Usage
- [client.files](https://docs.vlm.run/sdk-reference/node/components/files.md): Learn how to upload and manage files with the VLM Run Node.js SDK
- [client.hub](https://docs.vlm.run/sdk-reference/node/components/hub.md): Hub API Reference for the VLM Run Node.js SDK
- [client.models](https://docs.vlm.run/sdk-reference/node/components/models.md): Learn how to work with models in the VLM Run Node.js SDK
- [client.predictions](https://docs.vlm.run/sdk-reference/node/components/predictions.md): Manage predictions with the VLM Run Node.js SDK
- [Getting Started](https://docs.vlm.run/sdk-reference/node/getting-started.md): Learn how to install and use the VLM Run Node.js SDK
- [client.audio](https://docs.vlm.run/sdk-reference/node/predictions/audio.md): Learn how to process audio files with the VLM Run Node.js SDK
- [client.document](https://docs.vlm.run/sdk-reference/node/predictions/document.md): Learn how to process documents with the VLM Run Node.js SDK
- [client.image](https://docs.vlm.run/sdk-reference/node/predictions/image.md): Learn how to process images with the VLM Run Node.js SDK
- [client.video](https://docs.vlm.run/sdk-reference/node/predictions/video.md): Video Processing API for the VLM Run Node.js SDK
- [client.audio](https://docs.vlm.run/sdk-reference/predictions/audio.md): Audio Processing API
- [client.document](https://docs.vlm.run/sdk-reference/predictions/document.md): Document Processing API
- [client.image](https://docs.vlm.run/sdk-reference/predictions/image.md): Image Processing API
- [client.video](https://docs.vlm.run/sdk-reference/predictions/video.md): Video Processing API
- [Orion Skills](https://docs.vlm.run/skills/introduction.md): Modular, reusable capabilities for visual extraction and agent workflows
- [Create Skills](https://docs.vlm.run/skills/manage/create.md): Create skills from skill folders, prompts, or chat sessions
- [List & Lookup](https://docs.vlm.run/skills/manage/list-lookup.md): List and search for available skills
- [Update Skills](https://docs.vlm.run/skills/manage/update.md): Create new versions of existing skills
- [Quickstart](https://docs.vlm.run/skills/quickstart.md): Use a skill to extract structured data in under 2 minutes
- [Reference](https://docs.vlm.run/skills/reference.md): AgentSkill object and skill specification reference
- [Skill Structure](https://docs.vlm.run/skills/spec/overview.md): How a skill directory is organized
- [schema.json](https://docs.vlm.run/skills/spec/schema-json.md): JSON Schema for validating skill output
- [SKILL.md](https://docs.vlm.run/skills/spec/skill-md.md): Skill metadata and instructions format
- [vlmrun.yaml](https://docs.vlm.run/skills/spec/vlmrun-yaml.md): Execution configuration for agent-powered skills
- [Agent Execution](https://docs.vlm.run/skills/usage/agent.md): Use skills with the agent execution endpoint
- [Chat Completions](https://docs.vlm.run/skills/usage/chat.md): Use skills with the chat completions endpoint
- [Model Request](https://docs.vlm.run/skills/usage/generation.md): Use skills with model requests
- [Version Pinning](https://docs.vlm.run/skills/usage/version-pinning.md): Pin skill versions for reproducible results
- [Supported Files](https://docs.vlm.run/supported-files.md): File formats supported by VLM Run for document, image, video, and audio processing.
- [Ways to Use VLM Run](https://docs.vlm.run/ways-to-use-vlm-run.md): The four entry points into VLM Run (Requests, Executions, Chat Completions API, and Chat UI) and when to reach for each.
- [Webhooks](https://docs.vlm.run/webhooks.md): Receive real-time notifications when your async processing jobs complete

## OpenAPI Specs

- [openapi](https://api.vlm.run/openapi.json)