Skip to main content
Skills work with all VLM Run API generation endpoints (api.vlm.run). Pass skills in the config.skills parameter to automatically apply the skill’s prompt and JSON schema to your request.

Image → JSON

Extract structured JSON from images:
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.image.generate(
    images=[Image.open("photo.jpg")],
    model="vlm-1",
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="invoice-extraction", version="latest")]
    )
)

Document → JSON

Extract structured JSON from documents:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.document.generate(
    file=Path("invoice.pdf"),
    model="vlm-1",
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="invoice-extraction", version="latest")]
    ),
)

Video → JSON

Extract structured JSON from videos:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.video.generate(
    file=Path("recording.mp4"),
    model="vlm-1",
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="meeting-notes", version="latest")]
    ),
    batch=True,
)
When skills are provided and domain is omitted, the platform creates a dynamic application from the skill’s prompt and JSON schema. You do not need to specify a domain.