UI Parsing

Analyze and understand user interface elements in screenshots and application images. Perfect for automated testing, design system validation, accessibility auditing, and mobile app analysis.

UI parsing example showing UI element detection and classification with interactive elements

UI parsing example showing element detection and classification with interactive elements

UI VQA & Grounding

Web Interface

Examples of UI parsing for different interface types.

Usage Example

For UI parsing, we highly recommend using the Structured Outputs API to get the UI elements and hierarchy in a structured and validated data format.

from pydantic import BaseModel, Field
from vlmrun.client import VLMRun

class UIElement(BaseModel):
  type: str = Field(..., description="Type of UI element (button, input, text, etc.)")
  text: str | None = Field(None, description="Text content of the element")
  interactive: bool = Field(..., description="Whether the element is interactive")
  xywh: tuple[float, float, float, float] = Field(..., description="Bounding box coordinates")

class UIResponse(BaseModel):
  elements: list[UIElement] = Field(..., description="List of detected UI elements")

# Initialize the VLM Run client
client = VLMRun(
    base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>"
)

# Parse UI elements in the image
response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "Analyze all UI elements in this mobile app screenshot"},
            {"type": "image_url", "image_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/web.ui-automation/win11.jpeg", "detail": "auto"}}
          ]
        }
    ],
    response_format={"type": "json_schema", "schema": UIResponse.model_json_schema()},
)

# Print the response
print(response.choices[0].message.content)

# Validate the response
print(UIResponse.model_validate_json(response.choices[0].message.content))
>>> UIResponse(elements=[UIElement(type="button", text="Sign In", interactive=True, xywh=(0.25, 0.5, 0.04, 0.02)), ...])

FAQ

What is UI Parsing?

UI Parsing is the process of analyzing UI elements in screenshots and application images to identify UI elements, buttons, and interactive components for automated testing.

What is UI VQA & Grounding?

UI VQA & Grounding is the process of asking specific questions about the UI elements in screenshots and application images to identify UI elements, buttons, and interactive components for automated testing. This is different from UI parsing, where all UI elements are returned. In most cases, you should use UI VQA & Grounding to get more accurate results.

Get Started

Concepts

Image Capabilities

Document Capabilities

Video Capabilities

Misc

Usage Example

FAQ

Get Started

Concepts

Image Capabilities

Document Capabilities

Video Capabilities

Misc

​Usage Example

​FAQ

Usage Example

FAQ