Skip to main content
Skills are modular, reusable capabilities that provide vlm-1 with domain-specific expertise for visual extraction tasks. Instead of selecting a pre-defined domain, you reference a skill by name (and optionally pin a version) and the platform automatically applies the skill’s prompt and JSON schema to your request.

Why use Skills?

Skills let you decouple what you want extracted from how the extraction is configured:
  • Reusable: Create a skill once, reference it from any endpoint (image, document, video, audio, agent)
  • Versionable: Pin a specific skill version for reproducible results, or use "latest" to always get the newest revision
  • Composable: Pass multiple skills in a single request
  • Flexible: Use skills as an alternative to domains, or combine them with custom schemas
Skills are an alternative to domains. When you pass skills in your request config, you do not need to specify a domain — the skill’s built-in prompt and schema are used automatically.

Skill identifiers

Each skill can be referenced in two ways:
FieldDescriptionExample
skill_nameHuman-readable name for lookup"invoice-extraction"
skill_idUnique identifier (UUID or name string)"abc-123-def"
You must provide at least one of skill_name or skill_id. When using skill_name, you can also specify a version (defaults to "latest").

Using Skills

Image generation

Extract structured data from images using a skill instead of a domain:
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.image.generate(
    images=[Image.open("photo.jpg")],
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="invoice-extraction")]
    )
)

Document generation

Extract structured data from PDFs and other documents:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.document.generate(
    file=Path("invoice.pdf"),
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="invoice-extraction", version="latest")]
    )
)

Video generation

Process videos with skill-driven extraction:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.video.generate(
    file=Path("recording.mp4"),
    batch=True,
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="meeting-notes")]
    )
)

Audio generation

Process audio files with skill-driven extraction:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.audio.generate(
    file=Path("call.mp3"),
    batch=True,
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="call-summary")]
    )
)

Agent execution

Run an agent with skills to drive the extraction pipeline:
from vlmrun.client import VLMRun
from vlmrun.client.types import AgentExecutionConfig, AgentSkill

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

response = client.agent.execute(
    inputs={"file": {"type": "file_url", "file_url": {"url": "<file-url>"}}},
    config=AgentExecutionConfig(
        skills=[AgentSkill(skill_name="patient-referral", version="20260219-abc123")]
    ),
    batch=True,
)

Chat completions

Use skills in chat to provide domain-specific context:
from vlmrun.client import VLMRun

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[{"role": "user", "content": "Summarize this patient referral document."}],
    skills=[{"skill_name": "patient-referral"}],
)

Version pinning

By default, version is "latest", which resolves to the most recent revision of the skill. To pin a specific version for reproducibility:
from vlmrun.client.types import AgentSkill

skill = AgentSkill(skill_name="invoice-extraction", version="20260219-abc123")

Skills vs Domains

DomainsSkills
LookupFixed string (e.g. "document.invoice")Name + version (e.g. "invoice-extraction" @ "latest")
SchemaPre-defined per domainBundled with the skill
VersioningN/AExplicit version pinning
Custom promptsVia config.promptBuilt into the skill
Where to passdomain parameterconfig.skills list
When skills are provided and domain is omitted, the platform creates a dynamic application from the skill’s prompt and JSON schema. You can still pass domain alongside skills if needed.

AgentSkill reference

The AgentSkill object accepts the following fields:
FieldTypeDefaultDescription
skill_namestringnullHuman-readable skill name for lookup
skill_idstringnullUnique identifier (UUID or name string)
versionstring"latest"Skill version to use
typestring"vlm-run"Skill type
At least one of skill_name or skill_id must be provided. If both are given, skill_id takes precedence for resolution.