Getting Started

VLM-1 can extract structured data from rich visual PDFs and presentations. Here’s an example of a slide image from a presentation and the structured JSON output that VLM-1 can extract:

Sample image from a financial presentation.

{
"id": "...",
"created_at": "...",
"completed_at": "...",
"status": "completed",
"response": {
  "description": "The document details the differentiated operating model of Selective Insurance, highlighting its unique, locally based field model and franchise value distribution model with high-quality partners. It also includes a pie chart showing the distribution of 2023 net premiums written.",
  "title": "Differentiated Operating Model",
  "page_number": 7,
  "tables": [
    {
      "description": "The table highlights two key aspects of Selective Insurance's operating model: its unique, locally based field model and its franchise value distribution model with high-quality partners. It includes details on the locally based specialists, distribution partners, office locations, and quotes from partners.",
      "title": "Differentiated Operating Model Overview",
      "caption": null,
      "markdown": "| Aspect                                              | Description                                                                                         |\n|-----------------------------------------------------|-----------------------------------------------------------------------------------------------------|\n| Unique, locally based field model                   | - Locally based underwriting, claims, and safety management specialists                              |\n|                                                     | - Proven ability to develop and integrate actionable tools                                          |\n|                                                     | - Enables effective portfolio management in an uncertain loss trend environment                      |\n| Franchise value distribution model with high-quality partners | - Approximately 1,550 distribution partners selling standard lines products and services through approximately 2,650 office locations|\n|                                                     | - ~850 of these distribution partners sell personal lines products                                      |\n|                                                     | - ~90 wholesale agents sell E&S business                                                             |\n|                                                     | - ~6,400 distribution partners sell National Flood Insurance Program products across 50 states        |\n| Quote from Selective Agent                          | \"Everyone with Selective makes our customers feel like the #1 priority. The ease of working with Selective is unmatched.\"             |"
    }
  ],
  "charts": [
    {
      "type": "pie",
      "description": "Pie chart showing the distribution of 2023 net premiums written totaling $4 billion, with segments for Standard Commercial Lines (79%), Standard Personal Lines (10%), and Excess and Surplus Lines (11%).",
      "title": "2023 Net Premiums Written",
      "caption": null,
      "markdown": "| Category                     | Percentage |\n|------------------------------|------------|\n| Standard Commercial Lines    | 79%        |\n| Standard Personal Lines      | 10%        |\n| Excess and Surplus Lines     | 11%        |"
    }
  ]
},
}

Notebook Example

If you want to simply look at the code, skip to the colab notebook link directly here.

In this notebook, we will use the VLM-1 model to understand financial presentations. As an example, we will use a dataset from the SEC Edgar database that contains financial presentations from various companies. We will use the VLM-1 model to extract information from these presentations and analyze the data.

We will call the VLM-1 API using the Python requests library. We will use the generate endpoint of the API to extract visual information from the presentation slides.

import json
import os
import requests


VLM_BASE_URL = "https://api.vlm.run/v1"
response = requests.get(f"{VLM_BASE_URL}/health")
response.raise_for_status()
assert response.status_code == 200
response.json()
{'status': 'ok'}

Now, let’s list the available models in the vlm-1 API.

import getpass

VLM_API_KEY = os.getenv("VLM_API_KEY", None)
if VLM_API_KEY is None:
   VLM_API_KEY = getpass.getpass()

headers = {
   "Content-Type": "application/json",
   "Authorization": f"Bearer {VLM_API_KEY}",
}
response = requests.get(f"{VLM_BASE_URL}/models", headers=headers)
response.raise_for_status()

print(json.dumps(response.json(), indent=2))
[
  {
    "model": "vlm-1",
    "domain": "document.file"
  },
  {
    "model": "vlm-1",
    "domain": "video.tv-intelligence"
  },
  {
    "model": "vlm-1",
    "domain": "document.generative"
  },
  {
    "model": "vlm-1",
    "domain": "video.tv-news"
  },
  {
    "model": "vlm-1",
    "domain": "document.pdf"
  },
  {
    "model": "vlm-1",
    "domain": "document.presentation"
  }
]

We’ll call VLM-1 through a helper function that defines the header and schema. Note that this leverages a few utils defined in the collab notebook. Take a look at the link above for more details.

def vlm(image: Image.Image, domain: str):
    """Send an image to the VLM API."""
    data = {
        "model": "vlm-1",
        "domain": domain,
        "image": encode_image(image),
    }
    response = requests.post(f"{VLM_BASE_URL}/image/generate", headers=headers, json=data)
    response.raise_for_status()
    return response.json()


def vlm_visualize(image: Union[Image.Image, str, Path], domain: str):
    """Send an image to the VLM API and display the result."""
    if isinstance(image, str) and image.startswith("http"):
        image = download_image(image)
    elif isinstance(image, (str, Path)):
        if not Path(image).exists():
            raise FileNotFoundError(f"File not found {image}")
        image = Image.open(str(image)).convert("RGB")
    elif isinstance(image, Image.Image):
        image = image.convert("RGB")
    else:
        raise ValueError("Invalid image, must be a path, PIL Image or URL")

Example Output

Now let’s try this out on aanother example slide.

url = "https://raw.githubusercontent.com/autonomi-ai/vlm-cookbook/main/assets/financial-presentations/sigifirstquarter2024inve015.jpg"
vlm_visualize(url, domain="document.presentation")
{
  "id": "95c76a66-4f9f-4a6f-b318-fcebeabae449",
  "created_at": "2024-08-13T23:26:51.916563",
  "completed_at": "2024-08-13T23:26:59.832087",
  "response": {
    "description": "The document from Selective Insurance describes the impact of their portfolio management approach on business mix improvements. It contains a bar chart along with a pie chart and supporting text.",
    "title": null,
    "page_number": 15,
    "tables": null,
    "charts": [
      {
        "type": "bar",
        "description": "The bar chart illustrates the Renewal Pure Price and Point of Renewal Retention across different retention groups: Excellent, Above Average, Average, Below Average, and Low & Very Low. It demonstrates that higher retention is linked with lower pricing.",
        "title": null,
        "caption": "Standard Commercial Lines Pricing by Retention Group",
        "markdown": "| Retention Group   | Renewal Pure Price | Point of Renewal Retention | % of Premium |\n|-------------------|--------------------|----------------------------|--------------|\n| Excellent         | ~7%                | ~92%                       | 15%          |\n| Above Average     | ~10%               | ~90%                       | 14%          |\n| Average           | ~11%               | ~89%                       | 47%          |\n| Below Average     | ~11.5%             | ~87%                       | 16%          |\n| Low & Very Low    | ~13%               | ~75%                       | 8%           |\n\n_As of December 31, 2023_"
      },
      {
        "type": "pie",
        "description": "The pie chart depicts the mix of Direct Premium Written (DPW) in 2023 across various segments. The segments include Contractors, Mercantile & Services, Community & Public Services, Manufacturing & Wholesale, and Bonds.",
        "title": null,
        "caption": "2023 DPW Mix",
        "markdown": "| Business Segment                       | Percentage |\n|-------------------------------|------------|\n| Contractors                   | 44%        |\n| Mercantile & Services         | 25%        |\n| Community & Public Services   | 16%        |\n| Manufacturing & Wholesale     | 14%        |\n| Bonds                         | 1%         |"
      }
    ]
  },
  "status": "completed"
}

We can render the markdown inline

Get Started with our Document -> JSON API

Head over to our Document -> JSON to start building your own document processing pipeline with VLM-1. Sign-up for access to our API here.