Segment & Track

Create precise pixel-level segmentation masks for objects across video frames with temporal consistency. Perfect for video editing, autonomous driving, medical imaging, and augmented reality applications.

Key Features

Temporal Consistency: Maintain consistent object masks across video frames
Instance Tracking: Track individual objects with unique IDs throughout the video
Motion-Aware Segmentation: Adapt to object movement and deformation
Multi-Object Handling: Segment multiple objects simultaneously
Smooth Transitions: Ensure smooth mask transitions between frames
Real-time Processing: Efficient processing suitable for live video streams

Response Format

{
  "frames": [
    {
      "frame_number": 1,
      "timestamp": "00:00:01",
      "segments": [
        {
          "id": "person_001",
          "class_name": "person",
          "confidence": 0.94,
          "area": 15420,
          "bbox": {
            "x": 100,
            "y": 150,
            "width": 80,
            "height": 200
          },
          "mask_url": "https://api.vlmrun.com/masks/frame_1_person_001.png",
          "tracking_confidence": 0.92
        }
      ]
    }
  ],
  "object_tracking": {
    "unique_objects": 5,
    "tracking_consistency": 0.89,
    "object_lifespans": [
      {
        "object_id": "person_001",
        "first_frame": 1,
        "last_frame": 150,
        "total_frames": 150
      }
    ]
  },
  "processing_time": "2m 15s"
}

Supported Object Classes

Common Objects

People: person, face, hand, foot
Vehicles: car, truck, bus, motorcycle, bicycle
Animals: dog, cat, bird, horse, cow, sheep
Furniture: chair, table, bed, sofa, desk
Electronics: laptop, phone, tv, keyboard, mouse

Specialized Categories

Medical: organ, tissue, lesion, bone
Nature: tree, grass, sky, water, mountain
Indoor: wall, floor, ceiling, door, window
Outdoor: road, sidewalk, building, sign, traffic_light

Temporal Consistency Features

Object Tracking

Consistent IDs: Maintain the same object ID across all frames
Occlusion Handling: Track objects even when partially hidden
Re-identification: Recover object identity after temporary occlusion
Entry/Exit Detection: Identify when objects enter or leave the frame

Motion-Aware Segmentation

Deformation Handling: Adapt to object shape changes over time
Scale Changes: Handle objects moving closer or farther from camera
Rotation Tracking: Maintain accurate masks during object rotation
Partial Occlusion: Continue tracking when objects are partially hidden

Smooth Transitions

Interpolation: Fill in missing segments using temporal context
Boundary Smoothing: Ensure smooth mask boundaries across frames
Consistency Scoring: Measure and maintain segmentation quality
Error Correction: Automatically correct segmentation errors

Advanced Features

Multi-Object Tracking

Simultaneous Tracking: Track multiple objects of the same class
Interaction Analysis: Understand relationships between tracked objects
Collision Detection: Identify when objects interact or collide
Group Behavior: Analyze collective movement patterns

Real-time Processing

Live Stream Support: Process video streams in real-time
Adaptive Quality: Adjust processing quality based on available resources
Streaming Output: Provide segmentation data as it becomes available
Low Latency: Minimal delay between input and output

Custom Models

Domain-Specific: Train models for specific use cases and environments
Object-Specific: Specialized models for particular object types
Style Adaptation: Adapt to different video styles and conditions
Performance Optimization: Optimize for specific hardware requirements

Mask Formats

PNG Masks

Frame-by-Frame: Individual mask images for each frame
Binary or Grayscale: Each pixel value represents a segment ID
Compatible: Works with most video editing software
Efficient: Small file size for simple segmentations

Video Masks

Animated Masks: Video files showing segmentation over time
Smooth Playback: Consistent frame rate and timing
Transparency Support: Alpha channel for overlay applications
Multiple Formats: MP4, MOV, AVI support

JSON Metadata

Structured Data: Complete segmentation information in JSON format
Tracking Data: Object IDs, trajectories, and relationships
Analysis Ready: Easy to process for further analysis
API Compatible: Direct integration with other systems

Execute Agent

from pathlib import Path
from vlmrun.client import VLMRun

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

# Upload the video
file = client.files.upload(file=Path("sports_game.mp4"))

# Execute video segmentation
response = client.agent.execute(
    inputs={"file": file.public_url},
    config={
        "prompt": "Segment all people in this video and track them across frames",
        "capability": "video_segmentation",
        "object_classes": ["person"],
        "track_objects": True,
        "output_format": "video_mask"
    }
)

# Poll for results
import time
while True:
    execution = client.agent.executions.get(execution_id=response.execution_id)
    if execution.status == "completed":
        print(execution.response)
        break
    elif execution.status == "failed":
        print(f"Error: {execution.error}")
        break
    time.sleep(2)

Best Practices

Stable Video: Use stable, high-resolution video for accurate segmentation
Good Lighting: Ensure consistent lighting throughout the video
Clear Objects: Avoid heavily overlapping objects when possible
Appropriate Frame Rate: Use sufficient frame rate for smooth tracking

Try Video Segmentation

Experience video segmentation with live examples in our interactive notebook

Get Started

Capabilities

Segment & Track

Key Features

Response Format

Supported Object Classes

Common Objects

Specialized Categories

Temporal Consistency Features

Object Tracking

Motion-Aware Segmentation

Smooth Transitions

Advanced Features

Multi-Object Tracking

Real-time Processing

Custom Models

Mask Formats

PNG Masks

Video Masks

JSON Metadata

Execute Agent

Best Practices

Try Video Segmentation

Get Started

Capabilities

​Key Features

​Response Format

​Supported Object Classes

​Common Objects

​Specialized Categories

​Temporal Consistency Features

​Object Tracking

​Motion-Aware Segmentation

​Smooth Transitions

​Advanced Features

​Multi-Object Tracking

​Real-time Processing

​Custom Models

​Mask Formats

​PNG Masks

​Video Masks

​JSON Metadata

​Execute Agent

​Best Practices

Try Video Segmentation

Key Features

Response Format

Supported Object Classes

Common Objects

Specialized Categories

Temporal Consistency Features

Object Tracking

Motion-Aware Segmentation

Smooth Transitions

Advanced Features

Multi-Object Tracking

Real-time Processing

Custom Models

Mask Formats

PNG Masks

Video Masks

JSON Metadata

Execute Agent

Best Practices