The VLM Run MCP Server provides a comprehensive set of tools for processing images, documents, and videos. All tools are available through any MCP-compatible client.

Available Tools


I/O Tools

Tool
Description
put_image_urlLoad images from URLs into the system for processing by other tools.
put_file_urlLoad files from URLs into the system for processing by other tools.
preview_object_refGenerate preview URLs for images, files, and other objects stored in the system.

Image Tools

Tool
Description
detect_facesDetect faces in images and return facial features, landmarks, and confidence scores.
detect_objectsDetect and classify objects in images with bounding boxes and labels.
detect_textsExtract text from images using OCR, returning detected text with bounding boxes and confidence scores.
detect_logosDetect brand logos and trademarks in images with confidence scores and bounding box locations.
detect_barcodesDetect and decode barcodes and QR codes in images, returning data, type, and location.
parse_imageParse the image into a structured format.
rotate_imageRotate images by specified angle in degrees (counter-clockwise). Supports 90°, 180°, -90° rotations.
crop_imageCrop images to specified bounding box coordinates for precise region extraction.

Document Tools

Tool
Description
document_imagesExtract pages from documents as images with configurable offset and limit for batch processing.
detect_document_layoutExtract document layout including tables, figures, paragraphs, headers, and other structural elements.
detect_document_image_layoutExtract image layout including tables, figures, paragraphs, headers, and other structural elements.
parse_documentParse documents into structured format using specific domain schemas (e.g., invoices, receipts).
parse_document_imageParse single document images into structured format using domain-specific schemas.

Video Tools

Tool
Description
search_videoSearch a video for specific content and return relevant scenes, frames, or timestamps based on your query.
transcribe_videoTranscribe entire videos with detailed audio transcript and visual scene breakdown for ~30 second segments.
add_video_captionsAutomatically add subtitles or captions to video clips with customizable colors and positioning.

Hub Tools

Tool
Description
get_hub_infoGet hub version information and additional details about the domain catalog.
list_hub_domainsList all available domains for extraction, with optional filtering by category or name.
get_hub_schemaGet the complete JSON schema for any domain to understand the expected output structure.

Supported Inputs


Format TypeSupported Extensions
Image.jpg, .jpeg, .png, .webp, .bmp, .tiff, .tif, .heic
Document.pdf
Video.mp4, .avi, .mov, .webm
Audio.mp3, .wav, .m4a, .aac, .flac