vlm-1, coupled with the specialized TableWithLayout schema, provides a robust solution designed for experts who demand high fidelity and structural preservation.
The Challenge: Beyond Simple Extraction
Traditional methods often flatten table structures, losing critical layout information:- Nested Headers: Hierarchical column relationships are lost.
- Merged Cells: Spanning information is discarded, breaking row/column alignment.
- Semantic Grouping: Visual cues indicating related data are ignored.
MarkdownTable: A Schema for Structure and Semantics
To address these challenges, we introduce a new schema called MarkdownTable, meticulously designed to capture not just the cell content but also the table’s intrinsic structure and metadata.
Key Design Benefits:
- Hierarchical Headers (
headers):- The
namefield uses>to represent nesting (e.g.,Performance > Max Value). - The unique
idprovides a stable reference for each column, crucial for programmatic access and comparison. columnindex anddtypeadd essential metadata for data validation and processing.
- The
- Layout-Preserving Markdown (
content):- The table is rendered as GitHub-flavored markdown.
- Crucially, spanned cells are handled by repeating the cell value across the spanned rows/columns. This ensures the markdown table has a regular grid structure, directly loadable into structures like Pandas DataFrames without complex parsing or reconstruction.
- The first row of the markdown only contains the unique
ids from theheaderslist, providing a clean mapping for data ingestion.
- Rich Context (
metadata): Captures titles, captions, and notes often surrounding tables, providing essential context that might be lost otherwise. - Downstream Interoperability: The combination of structured
headersand the regularized markdowncontentfacilitates seamless conversion to Pandas DataFrames, database schemas, or input for further LLM analysis.
Extracting Structured Tables via SDK
You can leveragevlm-1 with the TableWithLayout schema using the Python SDK. Specify the schema in the GenerationConfig.
Example: Structured Output
Consider the following table with nested headers and merged cells.
Extracting Dense Tables in a Technical Document
MarkdownTable output captures this complexity:
Example: Rendered Output
TheMarkdownTable object contained in the MarkdownPage schema also includes a render method that renders the table as a markdown string.
Benefits Demonstrated:
- Nested Headers:
Performance > Min,Performance > Typ,Performance > Maxclearly show the hierarchy underPerformance. - Markdown Ready: The
rendermethod returns a string that is valid GitHub-flavored markdown: - Pandas Integration: The
datafield in theMarkdownTableobject can be easily read into a Pandas DataFrame, with appropriate header metadata such as uniqueid,columnindex,name, anddtype. We provide a convenience methodto_dataframeto convert theMarkdownTableobject to a Pandas DataFrame.
Fine-tuning for Domain Specificity
Whilevlm-1 offers strong general table extraction capabilities, optimal performance on highly specialized or uniquely formatted tables (e.g., specific financial reports, legacy scientific documents) can be achieved through fine-tuning. Consult our fine-tuning guides to adapt the model to your specific table structures and document types, maximizing accuracy and structural fidelity using the TableWithLayout schema.