Skip to main content
SKILL.md is the primary file in a skill directory. It combines YAML frontmatter for metadata with a Markdown body for instructions that guide the model or agent during execution.

Frontmatter Fields

The YAML frontmatter defines the skill’s metadata:
---
name: invoice-extraction
description: Extract structured data from invoice documents
version: "1.0"
license: MIT
---
FieldTypeRequiredDescription
namestringYesSkill identifier — used for lookup via skill_name
descriptionstringYesConcise description of what the skill does
skill_versionstringNoSkill version string
licensestringNoLicense type (e.g., MIT, Apache-2.0)

Available Toolsets

ToolsetDescription
coreBasic operations (file I/O, text processing)
documentDocument extraction and layout understanding
imageImage analysis and understanding
image-genImage generation and editing
videoVideo analysis and understanding
vizVisualization and annotation
webWeb search and retrieval
world-genWorld generation and editing

Markdown Body

The body after the frontmatter contains instructions that are injected into the model or agent prompt at execution time. Write clear, specific instructions for the extraction or analysis task.

Example: Image Analysis Skill

---
name: pillow
description: Image manipulation toolkit using Pillow (PIL)
license: MIT
toolsets:
  - image
---

# Pillow Image Processing

## Description
A comprehensive image processing skill using the Pillow library.

## Capabilities

| Function | Description | Input | Output |
|----------|-------------|-------|--------|
| `resize` | Resize image | Image + dimensions | Resized image |
| `crop` | Crop region | Image + bounding box | Cropped image |
| `rotate` | Rotate image | Image + angle | Rotated image |

## Constraints
- Maximum input resolution: 4096x4096
- Supported formats: PNG, JPEG, WebP

Example: Video Analysis Skill

---
name: finger-kitting-labeling
description: Detect and label finger-kitting interactions in assembly videos
toolsets:
  - core
  - video
---

# Finger Kitting Labeling

## Objective
Analyze assembly videos to detect finger-kitting interactions where
an operator picks components from bins.

## Analysis Strategy
1. Watch the full video to understand the assembly workflow
2. Identify each kitting interaction by timestamp
3. Classify the interaction type
4. Record start and end times in MM:SS format

## Output Requirements
- Each interaction must include reasoning, description, and timestamps
- Use the exact category names defined in the schema
Write instructions as if you’re briefing an expert analyst. Be specific about what to look for, how to classify it, and what format to use for the output.