Pointing

Detect and localize keypoints of objects, people or faces in images with precise coordinate mapping. Ideal for counting, localization and salience detection.

Keypoint prediction example showing object keypoints

Pointing example showing object keypoints.

Object Localization

Person Localization

Face Localization

Examples of keypoint detection for objects, people or faces.

Usage Example

For keypoint prediction, we highly recommend using the Structured Outputs API to get the keypoints and skeleton in a structured and validated data format.

!pip install vlmrun

from pydantic import BaseModel, Field
from vlmrun.client import VLMRun

class KeyPoint(BaseModel):
  xy: tuple[float, float] = Field(..., description="Normalized keypoint coordinates in the format between 0 and 1 [x, y]")
  label: str = Field(..., description="Label of the keypoint (e.g. 'person', 'car')")

class Keypoints(BaseModel):
  keypoints: list[KeyPoint] = Field(..., description="List of keypoints with their x and y coordinates and labels")

# Initialize the VLM Run client
client = VLMRun(
    base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>"
)

# Predict the keypoints in the image
response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "Point to all the cars and doors in this image"},
            {"type": "image_url", "image_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/image.caption/car.jpg", "detail": "auto"}}
          ]
        }
    ],
    response_format={"type": "json_schema", "schema": Keypoints.model_json_schema()},
)

print(response.choices[0].message.content)
>>> {"keypoints": [{"xy": [0.5, 0.5], "label": "car"}, {"xy": [0.6, 0.6], "label": "door"}]}

print(Keypoints.model_validate_json(response.choices[0].message.content))
>>> Keypoints(keypoints=[KeyPoint(xy=(0.5, 0.5), label='car'), KeyPoint(xy=(0.6, 0.6), label='door')])

FAQ

What format do the keypoints come in?

The keypoints come in the format of a list of objects with their keypoints. The keypoints are in the format of normalized xy, where x and y are the top-left corner of the keypoint. All values are between 0 and 1, and normalized by the image size. x and y are normalized by the image width and height respectively.

What other tags can you extract for each keypoint?

You can extract the following tags for each keypoint:

Object Name: The name of the object that the keypoint belongs to. For example, “car”, “door”, “person”, “face”, etc.
Confidence Score: The confidence score of the keypoint detection.

Get Started

Concepts

Image Capabilities

Document Capabilities

Video Capabilities

Misc

Usage Example

FAQ

Get Started

Concepts

Image Capabilities

Document Capabilities

Video Capabilities

Misc

​Usage Example

​FAQ

Usage Example

FAQ