Skip to main content
Create precise pixel-level segmentation masks for objects across video frames with temporal consistency. Perfect for video editing, autonomous driving, medical imaging, and augmented reality applications.

Key Features

  • Temporal Consistency: Maintain consistent object masks across video frames
  • Instance Tracking: Track individual objects with unique IDs throughout the video
  • Motion-Aware Segmentation: Adapt to object movement and deformation
  • Multi-Object Handling: Segment multiple objects simultaneously
  • Smooth Transitions: Ensure smooth mask transitions between frames
  • Real-time Processing: Efficient processing suitable for live video streams

Response Format

{
  "frames": [
    {
      "frame_number": 1,
      "timestamp": "00:00:01",
      "segments": [
        {
          "id": "person_001",
          "class_name": "person",
          "confidence": 0.94,
          "area": 15420,
          "bbox": {
            "x": 100,
            "y": 150,
            "width": 80,
            "height": 200
          },
          "mask_url": "https://api.vlmrun.com/masks/frame_1_person_001.png",
          "tracking_confidence": 0.92
        }
      ]
    }
  ],
  "object_tracking": {
    "unique_objects": 5,
    "tracking_consistency": 0.89,
    "object_lifespans": [
      {
        "object_id": "person_001",
        "first_frame": 1,
        "last_frame": 150,
        "total_frames": 150
      }
    ]
  },
  "processing_time": "2m 15s"
}

Supported Object Classes

Common Objects

  • People: person, face, hand, foot
  • Vehicles: car, truck, bus, motorcycle, bicycle
  • Animals: dog, cat, bird, horse, cow, sheep
  • Furniture: chair, table, bed, sofa, desk
  • Electronics: laptop, phone, tv, keyboard, mouse

Specialized Categories

  • Medical: organ, tissue, lesion, bone
  • Nature: tree, grass, sky, water, mountain
  • Indoor: wall, floor, ceiling, door, window
  • Outdoor: road, sidewalk, building, sign, traffic_light

Temporal Consistency Features

Object Tracking

  • Consistent IDs: Maintain the same object ID across all frames
  • Occlusion Handling: Track objects even when partially hidden
  • Re-identification: Recover object identity after temporary occlusion
  • Entry/Exit Detection: Identify when objects enter or leave the frame

Motion-Aware Segmentation

  • Deformation Handling: Adapt to object shape changes over time
  • Scale Changes: Handle objects moving closer or farther from camera
  • Rotation Tracking: Maintain accurate masks during object rotation
  • Partial Occlusion: Continue tracking when objects are partially hidden

Smooth Transitions

  • Interpolation: Fill in missing segments using temporal context
  • Boundary Smoothing: Ensure smooth mask boundaries across frames
  • Consistency Scoring: Measure and maintain segmentation quality
  • Error Correction: Automatically correct segmentation errors

Advanced Features

Multi-Object Tracking

  • Simultaneous Tracking: Track multiple objects of the same class
  • Interaction Analysis: Understand relationships between tracked objects
  • Collision Detection: Identify when objects interact or collide
  • Group Behavior: Analyze collective movement patterns

Real-time Processing

  • Live Stream Support: Process video streams in real-time
  • Adaptive Quality: Adjust processing quality based on available resources
  • Streaming Output: Provide segmentation data as it becomes available
  • Low Latency: Minimal delay between input and output

Custom Models

  • Domain-Specific: Train models for specific use cases and environments
  • Object-Specific: Specialized models for particular object types
  • Style Adaptation: Adapt to different video styles and conditions
  • Performance Optimization: Optimize for specific hardware requirements

Mask Formats

PNG Masks

  • Frame-by-Frame: Individual mask images for each frame
  • Binary or Grayscale: Each pixel value represents a segment ID
  • Compatible: Works with most video editing software
  • Efficient: Small file size for simple segmentations

Video Masks

  • Animated Masks: Video files showing segmentation over time
  • Smooth Playback: Consistent frame rate and timing
  • Transparency Support: Alpha channel for overlay applications
  • Multiple Formats: MP4, MOV, AVI support

JSON Metadata

  • Structured Data: Complete segmentation information in JSON format
  • Tracking Data: Object IDs, trajectories, and relationships
  • Analysis Ready: Easy to process for further analysis
  • API Compatible: Direct integration with other systems

Execute Agent

from pathlib import Path
from vlmrun.client import VLMRun

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

# Upload the video
file = client.files.upload(file=Path("sports_game.mp4"))

# Execute video segmentation
response = client.agent.execute(
    inputs={"file": file.public_url},
    config={
        "prompt": "Segment all people in this video and track them across frames",
        "capability": "video_segmentation",
        "object_classes": ["person"],
        "track_objects": True,
        "output_format": "video_mask"
    }
)

# Poll for results
import time
while True:
    execution = client.agent.executions.get(execution_id=response.execution_id)
    if execution.status == "completed":
        print(execution.response)
        break
    elif execution.status == "failed":
        print(f"Error: {execution.error}")
        break
    time.sleep(2)

Best Practices

  • Stable Video: Use stable, high-resolution video for accurate segmentation
  • Good Lighting: Ensure consistent lighting throughout the video
  • Clear Objects: Avoid heavily overlapping objects when possible
  • Appropriate Frame Rate: Use sufficient frame rate for smooth tracking

Try Video Segmentation

Experience video segmentation with live examples in our interactive notebook