What is Segmentation?

Segmentation is the process of dividing video content into smaller, meaningful segments. Think of it as creating chapters or scenes within your video that can be processed, analyzed, and referenced independently. Each segment represents a distinct portion of your video with specific start and end timestamps. Unlike processing an entire video as one unit, segmentation allows you to work with granular pieces of content. For example, in a 30-minute cooking video, you might have segments for ingredient preparation (0-5 minutes), cooking process (5-20 minutes), and final presentation (20-30 minutes). Each segment can then be transcribed, analyzed for entities, or processed independently. Segmentation is particularly powerful when combined with other Cloudglue operations like entity extraction or transcription, as it provides temporal context and allows for segment-specific analysis.

Segmentation Strategies

Cloudglue offers two primary segmentation strategies, each optimized for different use cases:

Uniform Segmentation

Uniform segmentation divides video content into fixed-duration segments with consistent timing intervals. This approach is ideal when you need predictable, evenly-spaced segments regardless of the video’s content. Key Parameters:
  • Window Seconds: The duration of each segment (e.g., 30 seconds)
  • Hop Seconds: The interval between segment starts, enabling overlapping segments if desired
Use Cases:
  • Regular Content Analysis: Perfect for systematic analysis where you need consistent time intervals
  • Performance Optimization: Uniform segments enable predictable processing loads and parallel execution
  • Time-based Indexing: Ideal for applications that reference content by specific time intervals
  • Overlapping Analysis: When hop_seconds < window_seconds, you get overlapping segments for comprehensive coverage
Example Configuration:
{
  "strategy": "uniform",
  "uniform_config": {
    "window_seconds": 30,
    "hop_seconds": 20
  }
}
This creates 30-second segments that start every 20 seconds, resulting in 10 seconds of overlap between consecutive segments.

Shot Detection Segmentation

Shot detection segmentation uses computer vision to identify natural scene changes and transitions in video content. This approach creates segments that align with the video’s actual content structure. Key Parameters:
  • Detector Strategy: Choose from adaptive or content-based detection
  • Threshold: Detection sensitivity (strategy-specific, lower values = more sensitive)
  • Min/Max Seconds: Constraints on segment duration to prevent overly short or long segments
Detector Strategies:
  • Adaptive Detector: Designed for dynamic footage with camera movement, panning, or action. Adapts to motion patterns to avoid false scene breaks during camera moves or fast action sequences. Examples: sports broadcasts, drone footage, handheld documentaries, action movies, live event recordings.
  • Content Detector: Optimized for controlled footage with clear visual transitions. Focuses on color and lighting changes to identify clean cuts between distinct scenes or shots. Examples: studio interviews, corporate videos, educational content, product demos, scripted content with traditional editing.
Use Cases:
  • Content-Aware Processing: Segments align with natural scene boundaries and visual transitions
  • Narrative Structure: Perfect for videos with distinct scenes, like interviews, presentations, or storytelling
  • Efficient Processing: Avoids artificial breaks mid-scene, preserving contextual integrity
  • Dynamic Content: Adapts to the video’s natural rhythm rather than imposing fixed intervals
Example Configurations: Adaptive detector for dynamic footage:
{
  "strategy": "shot-detector",
  "shot_detector_config": {
    "detector": "adaptive",
    "threshold": 3.2,
    "min_seconds": 5,
    "max_seconds": 60
  }
}
Content detector for static footage:
{
  "strategy": "shot-detector",
  "shot_detector_config": {
    "detector": "content",
    "threshold": 27.5,
    "min_seconds": 3,
    "max_seconds": 45
  }
}

Temporal Constraints

Both segmentation strategies support optional temporal constraints to focus processing on specific portions of your video:
  • Start Time: Begin segmentation at a specific timestamp (useful for skipping intros or irrelevant content)
  • End Time: Stop segmentation at a specific timestamp (useful for excluding outros or credits)
Example:
{
  "strategy": "uniform",
  "uniform_config": {
    "window_seconds": 20
  },
  "start_time_seconds": 60,
  "end_time_seconds": 300
}
This segments only the portion from 1 minute to 5 minutes of the video.

Integration with Other Operations

Segmentation becomes particularly powerful when combined with other Cloudglue operations:

With Entity Extraction

Segmentation enables segment-level entity extraction, where you can identify when specific entities appear in your video:
{
  "file_id": "your-file-id",
  "segmentation_config": {
    "strategy": "shot-detector",
    "shot_detector_config": {
      "detector": "content"
    }
  },
  "extract_config": {
    "prompt": "Identify products and brands shown in each scene",
    "entity_schema": {
      "products": [{ "name": "string", "brand": "string" }]
    },
    "enable_video_level_entities": false,
    "enable_segment_level_entities": true
  }
}
This extracts products and brands for each natural scene in the video, with results tied to specific segments rather than the entire video.

With Transcription

Combine segmentation with transcription to get time-aligned transcripts that correspond to specific video segments:
{
  "file_id": "your-file-id",
  "segmentation_config": {
    "strategy": "uniform",
    "uniform_config": {
      "window_seconds": 30
    }
  },
  "transcribe_config": {
    "speech": true,
    "scene_text": true,
    "visual_scene_description": true
  }
}
This provides rich transcripts for every 30-second segment of your video.

In Collections

Segmentation can be applied at the collection level, providing consistent segmentation across all files in a collection:
{
  "name": "Product Review Analysis",
  "collection_type": "entities",
  "default_segmentation_config": {
    "strategy": "shot-detector",
    "shot_detector_config": {
      "detector": "adaptive",
      "min_seconds": 10,
      "max_seconds": 45
    }
  }
}
All files added to this collection will automatically use shot detection segmentation.

Working with Segments

Once segmentation is complete, you can:
  1. Retrieve Segment Data: Get detailed information about each segment, including timestamps and any processed content
  2. Reference Specific Segments: Use segment IDs to reference specific portions of your video in other operations
  3. Analyze Segment-Level Results: Process results that are tied to specific temporal locations in your video

Choosing the Right Strategy

Use Uniform Segmentation when:
  • You need consistent, predictable segment durations
  • Processing performance and parallelization are priorities
  • Content structure is less important than temporal consistency
  • You’re building time-based indexing or search functionality
Use Shot Detection when:
  • Content structure and natural boundaries are important
  • You’re working with narrative content (interviews, presentations, stories)
  • You want segments that respect visual transitions and scene changes
  • Processing efficiency matters more than segment duration predictability
Choose Adaptive Detector for:
  • Sports footage, action videos, or handheld camera work
  • Content where the camera frequently moves, pans, or zooms
  • Videos with fast motion that might create unwanted scene breaks
  • Documentary-style footage with natural camera movement
Choose Content Detector for:
  • Professional interviews, presentations, or studio recordings
  • Content with clear, intentional cuts between different scenes
  • Static camera setups with controlled lighting
  • Educational videos or webinars with minimal camera movement

Next Steps

To start using segmentation in your applications:
  1. Create Segments: Use the File Segmentation API to segment individual files
  2. Retrieve Segments: Use the Get Segmentation API to access segment data
  3. Combine with Processing: Include segmentation_config in your extraction or transcription operations
  4. Collection-Level Segmentation: Set default_segmentation_config when creating collections
For detailed implementation examples and advanced use cases, explore our API reference and consider how segmentation can enhance your specific video processing workflows.