SKILL.md is the primary file in a skill directory. It combines YAML frontmatter for metadata with a Markdown body for instructions that guide the model or agent during execution.
Frontmatter Fields
The YAML frontmatter defines the skill’s metadata:
---
name: invoice-extraction
description: Extract structured data from invoice documents
version: "1.0"
license: MIT
---
| Field | Type | Required | Description |
|---|
name | string | Yes | Skill identifier — used for lookup via skill_name |
description | string | Yes | Concise description of what the skill does |
skill_version | string | No | Skill version string |
license | string | No | License type (e.g., MIT, Apache-2.0) |
| Toolset | Description |
|---|
core | Basic operations (file I/O, text processing) |
document | Document extraction and layout understanding |
image | Image analysis and understanding |
image-gen | Image generation and editing |
video | Video analysis and understanding |
viz | Visualization and annotation |
web | Web search and retrieval |
world-gen | World generation and editing |
Markdown Body
The body after the frontmatter contains instructions that are injected into the model or agent prompt at execution time. Write clear, specific instructions for the extraction or analysis task.
Example: Image Analysis Skill
---
name: pillow
description: Image manipulation toolkit using Pillow (PIL)
license: MIT
toolsets:
- image
---
# Pillow Image Processing
## Description
A comprehensive image processing skill using the Pillow library.
## Capabilities
| Function | Description | Input | Output |
|----------|-------------|-------|--------|
| `resize` | Resize image | Image + dimensions | Resized image |
| `crop` | Crop region | Image + bounding box | Cropped image |
| `rotate` | Rotate image | Image + angle | Rotated image |
## Constraints
- Maximum input resolution: 4096x4096
- Supported formats: PNG, JPEG, WebP
Example: Video Analysis Skill
---
name: finger-kitting-labeling
description: Detect and label finger-kitting interactions in assembly videos
toolsets:
- core
- video
---
# Finger Kitting Labeling
## Objective
Analyze assembly videos to detect finger-kitting interactions where
an operator picks components from bins.
## Analysis Strategy
1. Watch the full video to understand the assembly workflow
2. Identify each kitting interaction by timestamp
3. Classify the interaction type
4. Record start and end times in MM:SS format
## Output Requirements
- Each interaction must include reasoning, description, and timestamps
- Use the exact category names defined in the schema
Write instructions as if you’re briefing an expert analyst. Be specific about what to look for, how to classify it, and what format to use for the output.