Parsing Presentations
Extract structured data from rich visual PDFs and presentations.
Getting Started
VLM-1 can extract structured data from rich visual PDFs and presentations. Here’s an example of a slide image from a presentation and the structured JSON output that VLM-1 can extract:
Sample image from a financial presentation.
{
"id": "...",
"created_at": "...",
"completed_at": "...",
"status": "completed",
"response": {
"description": "The document details the differentiated operating model of Selective Insurance, highlighting its unique, locally based field model and franchise value distribution model with high-quality partners. It also includes a pie chart showing the distribution of 2023 net premiums written.",
"title": "Differentiated Operating Model",
"page_number": 7,
"tables": [
{
"description": "The table highlights two key aspects of Selective Insurance's operating model: its unique, locally based field model and its franchise value distribution model with high-quality partners. It includes details on the locally based specialists, distribution partners, office locations, and quotes from partners.",
"title": "Differentiated Operating Model Overview",
"caption": null,
"markdown": "| Aspect | Description |\n|-----------------------------------------------------|-----------------------------------------------------------------------------------------------------|\n| Unique, locally based field model | - Locally based underwriting, claims, and safety management specialists |\n| | - Proven ability to develop and integrate actionable tools |\n| | - Enables effective portfolio management in an uncertain loss trend environment |\n| Franchise value distribution model with high-quality partners | - Approximately 1,550 distribution partners selling standard lines products and services through approximately 2,650 office locations|\n| | - ~850 of these distribution partners sell personal lines products |\n| | - ~90 wholesale agents sell E&S business |\n| | - ~6,400 distribution partners sell National Flood Insurance Program products across 50 states |\n| Quote from Selective Agent | \"Everyone with Selective makes our customers feel like the #1 priority. The ease of working with Selective is unmatched.\" |"
}
],
"charts": [
{
"type": "pie",
"description": "Pie chart showing the distribution of 2023 net premiums written totaling $4 billion, with segments for Standard Commercial Lines (79%), Standard Personal Lines (10%), and Excess and Surplus Lines (11%).",
"title": "2023 Net Premiums Written",
"caption": null,
"markdown": "| Category | Percentage |\n|------------------------------|------------|\n| Standard Commercial Lines | 79% |\n| Standard Personal Lines | 10% |\n| Excess and Surplus Lines | 11% |"
}
]
},
}
Notebook Example
In this notebook, we will use the VLM-1 model to understand financial presentations. As an example, we will use a dataset from the SEC Edgar database that contains financial presentations from various companies. We will use the VLM-1 model to extract information from these presentations and analyze the data.
We will call the VLM-1 API using the Python requests library. We will use the generate endpoint of the API to extract visual information from the presentation slides.
import json
import os
import requests
VLM_BASE_URL = "https://api.vlm.run/v1"
response = requests.get(f"{VLM_BASE_URL}/health")
response.raise_for_status()
assert response.status_code == 200
response.json()
{'status': 'ok'}
Now, let’s list the available models in the vlm-1 API.
import getpass
VLM_API_KEY = os.getenv("VLM_API_KEY", None)
if VLM_API_KEY is None:
VLM_API_KEY = getpass.getpass()
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {VLM_API_KEY}",
}
response = requests.get(f"{VLM_BASE_URL}/models", headers=headers)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))
[
{
"model": "vlm-1",
"domain": "document.file"
},
{
"model": "vlm-1",
"domain": "video.tv-intelligence"
},
{
"model": "vlm-1",
"domain": "document.generative"
},
{
"model": "vlm-1",
"domain": "video.tv-news"
},
{
"model": "vlm-1",
"domain": "document.pdf"
},
{
"model": "vlm-1",
"domain": "document.presentation"
}
]
We’ll call VLM-1 through a helper function that defines the header and schema. Note that this leverages a few utils defined in the collab notebook. Take a look at the link above for more details.
def vlm(image: Image.Image, domain: str):
"""Send an image to the VLM API."""
data = {
"model": "vlm-1",
"domain": domain,
"image": encode_image(image),
}
response = requests.post(f"{VLM_BASE_URL}/image/generate", headers=headers, json=data)
response.raise_for_status()
return response.json()
def vlm_visualize(image: Union[Image.Image, str, Path], domain: str):
"""Send an image to the VLM API and display the result."""
if isinstance(image, str) and image.startswith("http"):
image = download_image(image)
elif isinstance(image, (str, Path)):
if not Path(image).exists():
raise FileNotFoundError(f"File not found {image}")
image = Image.open(str(image)).convert("RGB")
elif isinstance(image, Image.Image):
image = image.convert("RGB")
else:
raise ValueError("Invalid image, must be a path, PIL Image or URL")
Example Output
Now let’s try this out on aanother example slide.
url = "https://raw.githubusercontent.com/autonomi-ai/vlm-cookbook/main/assets/financial-presentations/sigifirstquarter2024inve015.jpg"
vlm_visualize(url, domain="document.presentation")
{
"id": "95c76a66-4f9f-4a6f-b318-fcebeabae449",
"created_at": "2024-08-13T23:26:51.916563",
"completed_at": "2024-08-13T23:26:59.832087",
"response": {
"description": "The document from Selective Insurance describes the impact of their portfolio management approach on business mix improvements. It contains a bar chart along with a pie chart and supporting text.",
"title": null,
"page_number": 15,
"tables": null,
"charts": [
{
"type": "bar",
"description": "The bar chart illustrates the Renewal Pure Price and Point of Renewal Retention across different retention groups: Excellent, Above Average, Average, Below Average, and Low & Very Low. It demonstrates that higher retention is linked with lower pricing.",
"title": null,
"caption": "Standard Commercial Lines Pricing by Retention Group",
"markdown": "| Retention Group | Renewal Pure Price | Point of Renewal Retention | % of Premium |\n|-------------------|--------------------|----------------------------|--------------|\n| Excellent | ~7% | ~92% | 15% |\n| Above Average | ~10% | ~90% | 14% |\n| Average | ~11% | ~89% | 47% |\n| Below Average | ~11.5% | ~87% | 16% |\n| Low & Very Low | ~13% | ~75% | 8% |\n\n_As of December 31, 2023_"
},
{
"type": "pie",
"description": "The pie chart depicts the mix of Direct Premium Written (DPW) in 2023 across various segments. The segments include Contractors, Mercantile & Services, Community & Public Services, Manufacturing & Wholesale, and Bonds.",
"title": null,
"caption": "2023 DPW Mix",
"markdown": "| Business Segment | Percentage |\n|-------------------------------|------------|\n| Contractors | 44% |\n| Mercantile & Services | 25% |\n| Community & Public Services | 16% |\n| Manufacturing & Wholesale | 14% |\n| Bonds | 1% |"
}
]
},
"status": "completed"
}
We can render the markdown inline
Get Started with our Document -> JSON API
Head over to our Document -> JSON to start building your own document processing pipeline with VLM-1. Sign-up for access to our API here.