Skip to main content
The MarkdownDocument schema is the cornerstone of VLM Run’s document processing system, providing a standardized, machine-readable representation of complex documents. This technical reference guide details the schema’s architecture, components, and implementation patterns.

MarkdownDocument Data Model

The MarkdownDocument schema addresses the fundamental challenges in document processing:
  1. Structural Preservation: Maintains document hierarchy and relationships
  2. Content Extraction: Handles mixed content types (text, tables, figures, code)
  3. Spatial Understanding: Preserves layout and positioning information
  4. Data Integrity: Ensures accurate representation of structured elements
  5. Extensibility: Supports custom annotations and metadata

1. MarkdownPage

A MarkdownDocument is a list of MarkdownPage objects, each representing a page in the document.
Here’s an alternative way to visualize the MarkdownPage schema:
ComponentFieldTypeDescription
MarkdownDocument
pagesList[MarkdownPage]Pages in the document
MarkdownPage
metadataPageMetadataMetadata of the page
tablesList[Table]Tables in the page
figuresList[Figure]Figures in the page
contentstrContent of the page
PageMetadata
languagestrLanguage of the document
page_numberintPage number of the document (0-indexed)
Table
metadata.titlestrTitle of the table
metadata.captionstrCaption of the table
metadata.notesstrNotes about the table
headers.idstrUnique identifier for the header
headers.columnintColumn index of the header
headers.namestrName of the header
headers.dtypestrData type of the header
data.*dict[str, Any]Maps column header ids to values
bboxBoxCoordsBounding box of the table
Figure
idintUnique identifier for the figure
titlestrTitle of the figure
captionstrCaption of the figure
bboxBoxCoordsBounding box of the figure

2. MarkdownTable

Tables are represented with a <Table id="tb-{id}"/> tag in the markdown content, with the actual table content stored in the tables list. This allows for rich representation of table’s data while maintaining the document’s flow.

3. Charts and Figures

Charts and figures are represented with a <Chart id="ch-{id}"/> tag in the content. The chart details are stored in the figures list, including properties like:

Example Usage

Here’s an example of how the MarkdownPage model is used to process a document:
from pathlib import Path
from vlmrun.client import VLMRun

from vlmrun.client.types import PredictionResponse, MarkdownDocument

# Initialize client
client = VLMRun(api_key="<VLMRUN_API_KEY>")

# Process document
response: PredictionResponse = client.document.generate(
    file=Path("document.pdf"),
    domain="document.markdown",
    batch=True,
)

# Access processed document
doc: MarkdownDocument = client.predictions.wait(response.id, timeout=120)
print(doc.model_dump_json(indent=2))

Example JSON Response

Here’s an example of how the MarkdownPage schema appears in a JSON response:
{
  "pages": [
    {                                 // page 0
      "metadata": {
        "page_number": 0
      },
      "tables": [
          {
            "metadata": {
            "title": "Sample Data Table",
            "caption": "Table showing example data"
          },
          "content": "| Header 1 | Header 2 |\n|----------|----------|\n| Data 1   | Data 2   |\n| Data 3   | Data 4   |",
          "headers": [
            {
              "id": "h1",
              "column": 0,
              "name": "Header 1",
              "dtype": "string"
            },
            ...
          ],
          "data": [
            {
              "h1": "Data 1",
              "h2": "Data 2"
            },
            ...
          ]
        }
      ],
      "figures": [
        {
          "id": 0,
          "title": "Sample Bar Chart",
          "caption": "Example visualization",
          "content": "..."
        }
        ...
      ],
      "content": "..."
    },
    {                                 // page 1
      ...
    },
    {                                 // page 2
      ...
    },
    ...
  ]
}
I