Structured Table Extraction with Layout Preservation
Leverage vlm-1 with the TableWithLayout schema to extract complex tables, preserving hierarchical structure and enabling robust downstream processing.
Extracting data from tables within documents presents a significant challenge, especially with dense, complex layouts involving nested headers, merged cells, and implicit structural hierarchies. Simple text extraction or basic table parsing often fails to capture the full semantic and structural integrity required for reliable downstream analysis and processing. vlm-1
, coupled with the specialized TableWithLayout
schema, provides a robust solution designed for experts who demand high fidelity and structural preservation.
The Challenge: Beyond Simple Extraction
Traditional methods often flatten table structures, losing critical layout information:
- Nested Headers: Hierarchical column relationships are lost.
- Merged Cells: Spanning information is discarded, breaking row/column alignment.
- Semantic Grouping: Visual cues indicating related data are ignored.
This loss of information hinders accurate data analysis, comparison across documents, and seamless integration into data pipelines (e.g., Pandas DataFrames).
MarkdownTable
: A Schema for Structure and Semantics
To address these challenges, we introduce a new schema called MarkdownTable
, meticulously designed to capture not just the cell content but also the table’s intrinsic structure and metadata.
Key Design Benefits:
- Hierarchical Headers (
headers
):- The
name
field uses>
to represent nesting (e.g.,Performance > Max Value
). - The unique
id
provides a stable reference for each column, crucial for programmatic access and comparison. column
index anddtype
add essential metadata for data validation and processing.
- The
- Layout-Preserving Markdown (
content
):- The table is rendered as GitHub-flavored markdown.
- Crucially, spanned cells are handled by repeating the cell value across the spanned rows/columns. This ensures the markdown table has a regular grid structure, directly loadable into structures like Pandas DataFrames without complex parsing or reconstruction.
- The first row of the markdown only contains the unique
id
s from theheaders
list, providing a clean mapping for data ingestion.
- Rich Context (
metadata
): Captures titles, captions, and notes often surrounding tables, providing essential context that might be lost otherwise. - Downstream Interoperability: The combination of structured
headers
and the regularized markdowncontent
facilitates seamless conversion to Pandas DataFrames, database schemas, or input for further LLM analysis.
Extracting Structured Tables via SDK
You can leverage vlm-1
with the TableWithLayout
schema using the Python SDK. Specify the schema in the GenerationConfig
.
Example: Structured Output
Consider the following table with nested headers and merged cells.
Extracting Dense Tables in a Technical Document
The MarkdownTable
output captures this complexity:
Example: Rendered Output
The MarkdownTable
object contained in the MarkdownPage
schema also includes a render
method that renders the table as a markdown string.
Benefits Demonstrated:
- Nested Headers:
Performance > Min
,Performance > Typ
,Performance > Max
clearly show the hierarchy underPerformance
. - Markdown Ready: The
render
method returns a string that is valid GitHub-flavored markdown: - Pandas Integration: The
data
field in theMarkdownTable
object can be easily read into a Pandas DataFrame, with appropriate header metadata such as uniqueid
,column
index,name
, anddtype
. We provide a convenience methodto_dataframe
to convert theMarkdownTable
object to a Pandas DataFrame.
Fine-tuning for Domain Specificity
While vlm-1
offers strong general table extraction capabilities, optimal performance on highly specialized or uniquely formatted tables (e.g., specific financial reports, legacy scientific documents) can be achieved through fine-tuning. Consult our fine-tuning guides to adapt the model to your specific table structures and document types, maximizing accuracy and structural fidelity using the TableWithLayout
schema.
Try our Document -> JSON API today
Head over to our Document -> JSON to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.