Learn how to extract visual groundings / citations from documents.
table
object corresponds to visually. This process is known as visual grounding.
For a hands-on tutorial, check out our Visual Grounding Notebook vlm-1
provides the ability to extract visual groundings from documents with a simple interface. The extracted visual groundings can be used to understand the context of the visual elements in the document and to link them to the corresponding textual content. Let’s take a look at an example showcasing visually grounding tables in a hardware spec-sheet.
Example showcasing visually grounding tables.
T1
and T2
) in the document. This information can be used to link the visual elements to the corresponding textual content in the document.
JSON Output