





Layout Detection Example on the Qwen-2.5 VL Tech Report.
Usage Example
For layout detection, we highly recommend using the Structured Outputs API to get the layout elements and bounding boxes in a structured and validated data format.
The following examples can detect headers, paragraphs, tables, lists, figures, and other document elements. The response schema includes bounding boxes, reading order and more.
FAQ
What layout elements are supported?
What layout elements are supported?
- Headers: H1-H6 level headers with hierarchical structure
- Paragraphs: Body text blocks with proper text flow
- Titles: Main title of the document
- Tables: Structured data with row/column detection
- Figures: Images, charts, diagrams, and visual elements
- Lists: Bulleted and numbered list structures
- Captions: Figure and table captions with associations
- Footnotes: Footnotes with references and content
- Formulas: Mathematical formulas and equations
- Pictures: Images and visual elements
- Section Headers: Section headers and titles
What format do the bounding boxes come in?
What format do the bounding boxes come in?
The bounding boxes come in the format of
xywh
, where x
and y
are the top-left corner coordinates, and w
and h
are the width and height of the bounding box. All values are in pixels relative to the document image.What is the reading order?
What is the reading order?
The reading order indicates the sequence in which elements should be read, following the natural document flow from top to bottom and left to right. This is useful for accessibility and content extraction.
Can it process multi-page documents?
Can it process multi-page documents?
Yes, the layout detection can process multi-page documents. Each page is analyzed separately, and the results include page-specific bounding boxes and reading orders.