Skip to main content
Detect and localize keypoints of objects, people or faces in images with precise coordinate mapping. Ideal for counting, localization and salience detection.
Keypoint prediction example showing object keypoints

Pointing example showing object keypoints.

Object Localization
Object keypoint detection
Person Localization
Person keypoint detection
Face Localization
Face keypoint detection

Examples of keypoint detection for objects, people or faces.

Usage Example

For keypoint prediction, we highly recommend using the Structured Outputs API to get the keypoints and skeleton in a structured and validated data format.
!pip install vlmrun

import openai
from pydantic import BaseModel, Field

class KeyPoint(BaseModel):
  xy: tuple[float, float] = Field(..., description="Normalized keypoint coordinates in the format between 0 and 1 [x, y]")
  label: str = Field(..., description="Label of the keypoint (e.g. 'person', 'car')")

class Keypoints(BaseModel):
  keypoints: list[KeyPoint] = Field(..., description="List of keypoints with their x and y coordinates and labels")

# Initialize the client
client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai",
    api_key="<VLMRUN_API_KEY>"
)

# Predict the keypoints in the image
response = client.chat.completions.create(
    model="vlm-agent-1",
    messages=[
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "Point to all the cars and doors in this image"},
            {"type": "image_url", "image_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/image.caption/car.jpg", "detail": "auto"}}
          ]
        }
    ],
    response_format={"type": "json_schema", "schema": Keypoints.model_json_schema()},
)

print(response.choices[0].message.content)
>>> {"keypoints": [{"xy": [0.5, 0.5], "label": "car"}, {"xy": [0.6, 0.6], "label": "door"}]}

print(Keypoints.model_validate_json(response.choices[0].message.content))
>>> Keypoints(keypoints=[KeyPoint(xy=(0.5, 0.5), label='car'), KeyPoint(xy=(0.6, 0.6), label='door')])

FAQ

The keypoints come in the format of a list of objects with their keypoints. The keypoints are in the format of normalized xy, where x and y are the top-left corner of the keypoint. All values are between 0 and 1, and normalized by the image size. x and y are normalized by the image width and height respectively.
You can extract the following tags for each keypoint:
  • Object Name: The name of the object that the keypoint belongs to. For example, “car”, “door”, “person”, “face”, etc.
  • Confidence Score: The confidence score of the keypoint detection.