Skills are modular, reusable capabilities that provide vlm-1 with domain-specific expertise for visual extraction tasks. Instead of selecting a pre-defined domain, you reference a skill by name (and optionally pin a version) and the platform automatically applies the skill’s prompt and JSON schema to your request.
Why use Skills?
Skills let you decouple what you want extracted from how the extraction is configured:
- Reusable: Create a skill once, reference it from any endpoint (image, document, video, audio, agent)
- Versionable: Pin a specific skill version for reproducible results, or use
"latest" to always get the newest revision
- Composable: Pass multiple skills in a single request
- Flexible: Use skills as an alternative to domains, or combine them with custom schemas
Skills are an alternative to domains. When you pass skills in your request config, you do not need to specify a domain — the skill’s built-in prompt and schema are used automatically.
Skill identifiers
Each skill can be referenced in two ways:
| Field | Description | Example |
|---|
skill_name | Human-readable name for lookup | "invoice-extraction" |
skill_id | Unique identifier (UUID or name string) | "abc-123-def" |
You must provide at least one of skill_name or skill_id. When using skill_name, you can also specify a version (defaults to "latest").
Using Skills
Image generation
Extract structured data from images using a skill instead of a domain:
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill
client = VLMRun(api_key="<VLMRUN_API_KEY>")
response = client.image.generate(
images=[Image.open("photo.jpg")],
config=GenerationConfig(
skills=[AgentSkill(skill_name="invoice-extraction")]
)
)
Document generation
Extract structured data from PDFs and other documents:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill
client = VLMRun(api_key="<VLMRUN_API_KEY>")
response = client.document.generate(
file=Path("invoice.pdf"),
config=GenerationConfig(
skills=[AgentSkill(skill_name="invoice-extraction", version="latest")]
)
)
Video generation
Process videos with skill-driven extraction:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill
client = VLMRun(api_key="<VLMRUN_API_KEY>")
response = client.video.generate(
file=Path("recording.mp4"),
batch=True,
config=GenerationConfig(
skills=[AgentSkill(skill_name="meeting-notes")]
)
)
Audio generation
Process audio files with skill-driven extraction:
from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill
client = VLMRun(api_key="<VLMRUN_API_KEY>")
response = client.audio.generate(
file=Path("call.mp3"),
batch=True,
config=GenerationConfig(
skills=[AgentSkill(skill_name="call-summary")]
)
)
Agent execution
Run an agent with skills to drive the extraction pipeline:
from vlmrun.client import VLMRun
from vlmrun.client.types import AgentExecutionConfig, AgentSkill
client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")
response = client.agent.execute(
inputs={"file": {"type": "file_url", "file_url": {"url": "<file-url>"}}},
config=AgentExecutionConfig(
skills=[AgentSkill(skill_name="patient-referral", version="20260219-abc123")]
),
batch=True,
)
Chat completions
Use skills in chat to provide domain-specific context:
from vlmrun.client import VLMRun
client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")
response = client.agent.completions.create(
model="vlmrun-orion-1:auto",
messages=[{"role": "user", "content": "Summarize this patient referral document."}],
skills=[{"skill_name": "patient-referral"}],
)
Version pinning
By default, version is "latest", which resolves to the most recent revision of the skill. To pin a specific version for reproducibility:
from vlmrun.client.types import AgentSkill
skill = AgentSkill(skill_name="invoice-extraction", version="20260219-abc123")
Skills vs Domains
| Domains | Skills |
|---|
| Lookup | Fixed string (e.g. "document.invoice") | Name + version (e.g. "invoice-extraction" @ "latest") |
| Schema | Pre-defined per domain | Bundled with the skill |
| Versioning | N/A | Explicit version pinning |
| Custom prompts | Via config.prompt | Built into the skill |
| Where to pass | domain parameter | config.skills list |
When skills are provided and domain is omitted, the platform creates a dynamic application from the skill’s prompt and JSON schema. You can still pass domain alongside skills if needed.
AgentSkill reference
The AgentSkill object accepts the following fields:
| Field | Type | Default | Description |
|---|
skill_name | string | null | Human-readable skill name for lookup |
skill_id | string | null | Unique identifier (UUID or name string) |
version | string | "latest" | Skill version to use |
type | string | "vlm-run" | Skill type |
At least one of skill_name or skill_id must be provided. If both are given, skill_id takes precedence for resolution.