Get Started
MCP Tools Reference
Complete reference of all available VLM Run MCP tools for visual AI processing.
The VLM Run MCP Server provides a comprehensive set of tools for processing images, documents, and videos. All tools are available through any MCP-compatible client.
Available Tools
I/O Tools
Tool | Description |
---|---|
put_image_url | Load images from URLs into the system for processing by other tools. |
put_file_url | Load files from URLs into the system for processing by other tools. |
preview_object_ref | Generate preview URLs for images, files, and other objects stored in the system. |
Image Tools
Tool | Description |
---|---|
detect_faces | Detect faces in images and return facial features, landmarks, and confidence scores. |
detect_objects | Detect and classify objects in images with bounding boxes and labels. |
detect_texts | Extract text from images using OCR, returning detected text with bounding boxes and confidence scores. |
detect_logos | Detect brand logos and trademarks in images with confidence scores and bounding box locations. |
detect_barcodes | Detect and decode barcodes and QR codes in images, returning data, type, and location. |
parse_image | Parse the image into a structured format. |
rotate_image | Rotate images by specified angle in degrees (counter-clockwise). Supports 90°, 180°, -90° rotations. |
crop_image | Crop images to specified bounding box coordinates for precise region extraction. |
Document Tools
Tool | Description |
---|---|
document_images | Extract pages from documents as images with configurable offset and limit for batch processing. |
detect_document_layout | Extract document layout including tables, figures, paragraphs, headers, and other structural elements. |
detect_document_image_layout | Extract image layout including tables, figures, paragraphs, headers, and other structural elements. |
parse_document | Parse documents into structured format using specific domain schemas (e.g., invoices, receipts). |
parse_document_image | Parse single document images into structured format using domain-specific schemas. |
Video Tools
Tool | Description |
---|---|
search_video | Search a video for specific content and return relevant scenes, frames, or timestamps based on your query. |
transcribe_video | Transcribe entire videos with detailed audio transcript and visual scene breakdown for ~30 second segments. |
add_video_captions | Automatically add subtitles or captions to video clips with customizable colors and positioning. |
Hub Tools
Tool | Description |
---|---|
get_hub_info | Get hub version information and additional details about the domain catalog. |
list_hub_domains | List all available domains for extraction, with optional filtering by category or name. |
get_hub_schema | Get the complete JSON schema for any domain to understand the expected output structure. |
Supported Inputs
Format Type | Supported Extensions |
---|---|
Image | .jpg , .jpeg , .png , .webp , .bmp , .tiff , .tif , .heic |
Document | .pdf |
Video | .mp4 , .avi , .mov , .webm |
Audio | .mp3 , .wav , .m4a , .aac , .flac |