Detection

Detect objects, people, or other entities in images with precise bounding boxes and confidence scores. Ideal for inventory management, quality control, security applications, and automated visual inspection.

Object detection example showing detected objects with bounding boxes

Object detections visualized.

Persons

Faces

Example detections of objects, people and faces.

Usage Example

For object detection, we highly recommend using the Structured Outputs API to get the bounding boxes and confidence scores in a structured and validated data format.

The following examples can also be used for face or person detection. The response schema is identical to the object detection example.

from pydantic import BaseModel, Field
from vlmrun.client import VLMRun

class Detection(BaseModel):
  label: str = Field(..., description="Name of the detected object")
  xywh: tuple[float, float, float, float] = Field(..., description="Bounding box of the detection (x, y, width, height)")

class Detections(BaseModel):
  detections: list[Detection] = Field(..., description="Detections of the detected objects")

# Initialize the VLM Run client
client = VLMRun(
    base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>"
)

# Detect objects in the image
response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "Detect all the donuts in this image"},
            {"type": "image_url", "image_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/image.object-detection/donuts.png", "detail": "auto"}}
          ]
        }
    ],
    response_format={"type": "json_schema", "schema": Detections.model_json_schema()},
)

# Print the response
print(response.choices[0].message.content)

# Validate the response
print(Detections.model_validate_json(response.choices[0].message.content))
>>> Detections(detections=[Detection(label="donut", xywh=(150, 200, 80, 75)), Detection(label="donut", xywh=(150, 200, 80, 75)), Detection(label="donut", xywh=(150, 200, 80, 75))])

FAQ

What objects are supported?

Common Objects: person, car, truck, bus, bicycle, motorcycle
Animals: dog, cat, bird, horse, cow, sheep
Food Items: apple, banana, sandwich, pizza, donut, cake
Electronics: laptop, phone, tv, keyboard, mouse
Furniture: chair, table, bed, sofa, desk
And 80+ more COCO dataset classes

What format do the bounding boxes come in?

The bounding boxes come in the format of normalized xywh, where x and y are the top-left corner of the bounding box, and w and h are the width and height of the bounding box. All values are between 0 and 1, and normalized by the image size. x and w are normalized by the image width, and y and h are normalized by the image height.

What is the confidence score?

The confidence score is a value between 0 and 1 that indicates the confidence of the detection.

Get Started

Image Capabilities

Document Capabilities

Video Capabilities

Misc

Persons

Faces

Usage Example

FAQ

Get Started

Image Capabilities

Document Capabilities

Video Capabilities

Misc

​Persons

​Faces

​Usage Example

​FAQ

Persons

Faces

Usage Example

FAQ