Introduction
Extract JSON from images, videos, and documents with a unified API.
What is VLM-1?
VLM-1 is a highly-specialized Vision-Language Model that excels at extracting structured JSON from images, videos and documents. It is designed to be a versatile API that can be fine-tuned for a variety of domains, including sports, news, finance, and more. VLM-1 is a highly specialized Vision Language Model that allows enterprises to accurately extract JSON from diverse visual sources such as images, documents and presentations - a.k.a. ETL for any visual content. By leveraging VLM-1, enterprises can effortlessly process and index unstructured visual data into their existing JSON databases, transforming raw multi-modal and unstructured information into valuable insights and opportunities.
Overview of multi-modal AI understanding with VLM-1.
What makes VLM-1 unique?
Here are some key features of VLM-1 that set it apart from other foundation models and APIs:
Structured Ouptuts
Robustly extract JSON from a variety of visual inputs such as images, videos, and PDFs, and automate your visual workflows.
Fine-tuning
Fine-tune our models for specific domains and confidently embed vision in your application with enterprise-grade SLAs.
Scalable
Scale your workloads confidently without being rate-limited or worried about your costs spiraling out of control.
Private Deployments
Deploy your custom models on-prem or in a private cloud, and keep your data secure and private.
Let’s get started!
Below you’ll find the API reference and code samples so you can start building for your use case. Get on our waitlist for an API key below, then check out some of our cookbooks to learn how to use VLM-1 to perform fast, structured extraction on your visual data.