VLM Run MCP Server
Teach your AI agents to See, Act and Automate – with VLM Run MCP
The VLM Run MCP server gives any MCP-compatible AI agent the ability to see and understand visual content - a capability that’s typically missing in LLMs. No complex API integrations needed - just connect your AI agent to our hosted MCP server and instantly unlock the power to process images, documents, videos, and other visual content.
MCP Quickstart
Connect to Claude Desktop and start processing visual content immediately.
Explore MCP Tools
See the complete catalog of visual AI capabilities and tools.
Explore MCP Examples
See examples of how to use VLM Run MCP tools to build your own document processing pipeline.
Sign-up Today
Sign-up for an API key and start building visual AI into your agents today.
Bridging computer-vision tools to AI agents through language
VLM Run MCP instantly augments your LLM agents with advanced visual-processing capabilities without manual integration. With VLM Run MCP tools, your AI agent can analyze images, extract data from rich visually-complex PDFs, and even process audio/videos. The LLM agent automatically selects the right tool for each task.
Let’s take a look at a few use-cases that can be automated with VLM Run MCP tools.
Installation
Get your API key
Head over to the VLM Run Dashboard to get your API key ($VLMRUN_API_KEY
). We’ll use this to authenticate your requests to the MCP server next.
Add the server to your MCP client
Add our hosted server with the following syntax to your MCP client configuration. Works with Claude Desktop, OpenAI API, Gemini SDK, or any MCP-compatible platform.
Authentication using the above approach is purely experimental, and is subject to change. We’ll be announcing our OAuth 2.1-based authentication, as per the MCP spec, soon.
Ping the MCP server to test
Copy-paste the server URL linked above in your browser and you should see a ping response from the MCP server.
Start building your agent with VLM Run MCP
Head on over to the quickstart page and tools page to get started with VLM Run MCP tools. Our intro MCP notebook is also a great place to start.
Current Capabilities
Take a quick look at the current catalog of visual AI tools available through VLM Run MCP server. We’re constantly adding new tools and capabilities, so this list is always evolving. Join our Discord channel to stay updated on the latest features and capabilities, and feel free to request new tools.
Core Processing Tools
- I/O Tools: Load images, files, and other objects into the system for processing by other tools.
- Document AI Tools: Extract structured data from invoices, receipts, contracts, forms, and any document type
- Image AI Tools: Classify images, extract text, analyze visual content, and understand scenes
- Video AI Tools: Transcribe videos with scene descriptions, search content, and analyze meetings
- Hub: Browse 50+ pre-built domains and schemas
How it works
VLM Run MCP Server follows the Model Context Protocol standard, acting as the bridge between your AI client and powerful visual processing capabilities.
Configure your MCP client
Add our hosted server https://mcp.vlm.run/${VLMRUN_API_KEY}/sse
to your MCP client configuration. Works with Claude Desktop, OpenAI API, Gemini SDK, or any MCP-compatible platform.
Agent discovers available tools
Your agent automatically discovers all VLM Run tools: parse_image
, parse_document
, put_image_url
, put_file_url
, and so on.
Natural conversation triggers tools
Simply ask your agent to process visual content. Behind the scenes, it calls the appropriate VLM Run MCP tools with your files and requirements.
Get actionable results
Your AI receives structured data and can immediately use it - extract invoice totals for accounting, create meeting summaries from videos, or generate privacy-compliant documents for sharing.
Try our MCP server today
Head over to our MCP server to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.