The MarkdownDocument
schema is the cornerstone of VLM Run’s document processing system, providing a standardized, machine-readable representation of complex documents. This technical reference guide details the schema’s architecture, components, and implementation patterns.
MarkdownDocument
Data Model
The MarkdownDocument
schema addresses the fundamental challenges in document processing:
Structural Preservation : Maintains document hierarchy and relationships
Content Extraction : Handles mixed content types (text, tables, figures, code)
Spatial Understanding : Preserves layout and positioning information
Data Integrity : Ensures accurate representation of structured elements
Extensibility : Supports custom annotations and metadata
1. MarkdownPage
A MarkdownDocument
is a list of MarkdownPage
objects, each representing a page in the document.
Here’s an alternative way to visualize the MarkdownPage
schema:
Tabular Representation of `MarkdownPage`
Component Field Type Description MarkdownDocument pages
List[MarkdownPage]
Pages in the document MarkdownPage metadata
PageMetadata
Metadata of the page tables
List[Table]
Tables in the page figures
List[Figure]
Figures in the page content
str
Content of the page PageMetadata language
str
Language of the document page_number
int
Page number of the document (0-indexed) Table metadata.title
str
Title of the table metadata.caption
str
Caption of the table metadata.notes
str
Notes about the table headers.id
str
Unique identifier for the header headers.column
int
Column index of the header headers.name
str
Name of the header headers.dtype
str
Data type of the header data.*
dict[str, Any]
Maps column header ids to values bbox
BoxCoords
Bounding box of the table Figure id
int
Unique identifier for the figure title
str
Title of the figure caption
str
Caption of the figure bbox
BoxCoords
Bounding box of the figure
2. MarkdownTable
Tables are represented with a <Table id="tb-{id}"/>
tag in the markdown content, with the actual table content stored in the tables
list. This allows for rich representation of table’s data while maintaining the document’s flow.
Charts and figures are represented with a <Chart id="ch-{id}"/>
tag in the content. The chart details are stored in the figures
list, including properties like:
Example Usage
Here’s an example of how the MarkdownPage
model is used to process a document:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import PredictionResponse, MarkdownDocument
# Initialize client
client = VLMRun( api_key = "<VLMRUN_API_KEY>" )
# Process document
response: PredictionResponse = client.document.generate(
file = Path( "document.pdf" ),
domain = "document.markdown" ,
batch = True ,
)
# Access processed document
doc: MarkdownDocument = client.predictions.wait(response.id, timeout = 120 )
print (doc.model_dump_json( indent = 2 ))
Example JSON Response
Here’s an example of how the MarkdownPage schema appears in a JSON response:
{
"pages" : [
{ // page 0
"metadata" : {
"page_number" : 0
},
"tables" : [
{
"metadata" : {
"title" : "Sample Data Table" ,
"caption" : "Table showing example data"
},
"content" : "| Header 1 | Header 2 | \n |----------|----------| \n | Data 1 | Data 2 | \n | Data 3 | Data 4 |" ,
"headers" : [
{
"id" : "h1" ,
"column" : 0 ,
"name" : "Header 1" ,
"dtype" : "string"
},
...
],
"data" : [
{
"h1" : "Data 1" ,
"h2" : "Data 2"
},
...
]
}
],
"figures" : [
{
"id" : 0 ,
"title" : "Sample Bar Chart" ,
"caption" : "Example visualization" ,
"content" : "..."
}
...
],
"content" : "..."
},
{ // page 1
...
},
{ // page 2
...
},
...
]
}