The VLM Run MCP server gives any MCP-compatible AI agent the ability to see and understand visual content - a capability that’s typically missing in LLMs. No complex API integrations needed - just connect your AI agent to our hosted MCP server and instantly unlock the power to process images, documents, videos, and other visual content.

Bridging computer-vision tools to AI agents through language

VLM Run MCP instantly augments your LLM agents with advanced visual-processing capabilities without manual integration. With VLM Run MCP tools, your AI agent can analyze images, extract data from rich visually-complex PDFs, and even process audio/videos. The LLM agent automatically selects the right tool for each task.

Let’s take a look at a few use-cases that can be automated with VLM Run MCP tools.

Installation

1

Get your API key

Head over to the VLM Run Dashboard to get your API key ($VLMRUN_API_KEY). We’ll use this to authenticate your requests to the MCP server next.

2

Add the server to your MCP client

Add our hosted server with the following syntax to your MCP client configuration. Works with Claude Desktop, OpenAI API, Gemini SDK, or any MCP-compatible platform.

https://mcp.vlm.run/${VLMRUN_API_KEY}/sse

Authentication using the above approach is purely experimental, and is subject to change. We’ll be announcing our OAuth 2.1-based authentication, as per the MCP spec, soon.

3

Ping the MCP server to test

Copy-paste the server URL linked above in your browser and you should see a ping response from the MCP server.

4

Start building your agent with VLM Run MCP

Head on over to the quickstart page and tools page to get started with VLM Run MCP tools. Our intro MCP notebook is also a great place to start.

Current Capabilities

Take a quick look at the current catalog of visual AI tools available through VLM Run MCP server. We’re constantly adding new tools and capabilities, so this list is always evolving. Join our Discord channel to stay updated on the latest features and capabilities, and feel free to request new tools.

Core Processing Tools

  • I/O Tools: Load images, files, and other objects into the system for processing by other tools.
  • Document AI Tools: Extract structured data from invoices, receipts, contracts, forms, and any document type
  • Image AI Tools: Classify images, extract text, analyze visual content, and understand scenes
  • Video AI Tools: Transcribe videos with scene descriptions, search content, and analyze meetings
  • Hub: Browse 50+ pre-built domains and schemas

How it works

VLM Run MCP Server follows the Model Context Protocol standard, acting as the bridge between your AI client and powerful visual processing capabilities.

1

Configure your MCP client

Add our hosted server https://mcp.vlm.run/${VLMRUN_API_KEY}/sse to your MCP client configuration. Works with Claude Desktop, OpenAI API, Gemini SDK, or any MCP-compatible platform.

2

Agent discovers available tools

Your agent automatically discovers all VLM Run tools: parse_image, parse_document, put_image_url, put_file_url, and so on.

3

Natural conversation triggers tools

Simply ask your agent to process visual content. Behind the scenes, it calls the appropriate VLM Run MCP tools with your files and requirements.

4

Get actionable results

Your AI receives structured data and can immediately use it - extract invoice totals for accounting, create meeting summaries from videos, or generate privacy-compliant documents for sharing.

Try our MCP server today

Head over to our MCP server to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.