Dense Table Extraction
Learn how to extract dense tables with structure from documents.
One of the major challenges in document processing is extracting structured data from dense tables. Dense tables are tables with a large number of rows and columns, making it difficult to extract the data accurately. VLM-1 provides a solution to this problem by extracting dense tables with structure from documents. Let’s take a look at an example showcasing dense table extraction from a technical document.
Example showcasing dense table extraction.
As you can see in the visualized example above, VLM-1 can accurately extract dense tables with structure and dense text from documents. Now, let’s take a look at the JSON output from the dense table extraction process. You will notice that the JSON output contains detailed information about the extracted lines
, paragraphs
, tables
, and charts
from the document, along with grounding information bbox
for each element.
JSON Output
{
"id": "...",
"created_at": "...",
"completed_at": "...",
{
"description": "This page from the AD74413R data sheet provides typical performance characteristics for voltage output, i
ncluding graphs and plots for INL vs. DAC Code, DNL vs. DAC Code, TUE vs. DAC Code, screw terminal voltage and SYNC pin vol
tage over time, and full-scale positive and negative steps.",
"title": "AD74413R Data Sheet: Typical Performance Characteristics - Voltage Output",
"page_number": 21,
"metadata": {
"contains_toc": false,
"contains_table": false,
"contains_diagram": true
},
"lines": [
{
"content": "Data Sheet",
"bbox": {
"xywh": [ 0.07470588235294118,
0.07470588235294118,
0.7281818181818182,
0.8688235294117648,
0.21272727272727276
]
}
},
...
],
"paragraphs": [
...
],
"tables": [
...
],
"charts": [
...
]
}
}
Get Started with our Document -> JSON API
Head over to our Document -> JSON to start building your own document processing pipeline with VLM-1. Sign-up for access to our API here.