MCP Tools Reference - VLM Run

On this page

Available Tools
I/O Tools
Image Tools
Document Tools
Video Tools
Hub Tools
Supported Inputs

The VLM Run MCP Server provides a comprehensive set of tools for processing images, documents, and videos. All tools are available through any MCP-compatible client.

MCP Tools Showcase

Navigate over to the MCP tools showcase notebook in our playground to see the MCP tools in action.

Available Tools

I/O Tools

Tool	Description
`put_image_url`	Load images from URLs into the system for processing by other tools.
`put_file_url`	Load files from URLs into the system for processing by other tools.
`preview_file_url`	Generate preview URLs for images, files, and other objects stored in the system.
`preview_image`	Get a preview image from an image reference that can be rendered inline in a UI.

Image Tools

Tool	Description
`detect_faces`	Detect faces in images and return facial features, landmarks, and confidence scores.
`detect_objects`	Detect and classify objects in images with bounding boxes and labels.
`detect_texts`	Extract text from images using OCR, returning detected text with bounding boxes and confidence scores.
`detect_logos`	Detect brand logos and trademarks in images with confidence scores and bounding box locations.
`detect_barcodes`	Detect and decode barcodes and QR codes in images, returning data, type, and location.
`parse_image`	Parse the image into a structured format.
`rotate_image`	Rotate images by specified angle in degrees (counter-clockwise). Supports 90°, 180°, -90° rotations.
`crop_image`	Crop images to specified bounding box coordinates for precise region extraction.

Document Tools

Tool	Description
`document_images`	Extract pages from documents as images with configurable offset and limit for batch processing.
`detect_document_layout`	Extract document layout including tables, figures, paragraphs, headers, and other structural elements.
`detect_document_image_layout`	Extract image layout including tables, figures, paragraphs, headers, and other structural elements.
`parse_document`	Parse documents into structured format using specific domain schemas (e.g., invoices, receipts).
`parse_document_image`	Parse single document images into structured format using domain-specific schemas.

Video Tools

Tool	Description
`search_video`	Search a video for specific content and return relevant scenes, frames, or timestamps based on your query.
`transcribe_video`	Transcribe entire videos with detailed audio transcript and visual scene breakdown for ~30 second segments.
`add_video_captions`	Automatically add subtitles or captions to video clips with customizable colors and positioning.

Hub Tools

Tool	Description
`get_hub_info`	Get hub version information and additional details about the domain catalog.
`list_hub_domains`	List all available domains for extraction, with optional filtering by category or name.
`get_hub_schema`	Get the complete JSON schema for any domain to understand the expected output structure.

Supported Inputs

Format Type	Supported Extensions
Image	`.jpg`, `.jpeg`, `.png`, `.webp`, `.bmp`, `.tiff`, `.tif`, `.heic`
Document	`.pdf`
Video	`.mp4`, `.avi`, `.mov`, `.webm`
Audio	`.mp3`, `.wav`, `.m4a`, `.aac`, `.flac`

MCP Quickstart Document Redaction