Segmentation

What is Segmentation?

Segmentation is the process of dividing video content into smaller, meaningful segments. Think of it as creating chapters or scenes within your video that can be processed, analyzed, and referenced independently. Each segment represents a distinct portion of your video with specific start and end timestamps. Unlike processing an entire video as one unit, segmentation allows you to work with granular pieces of content. For example, in a 30-minute cooking video, you might have segments for ingredient preparation (0-5 minutes), cooking process (5-20 minutes), and final presentation (20-30 minutes). Each segment can then be transcribed, analyzed for entities, or processed independently. Segmentation is particularly powerful when combined with other Cloudglue operations like entity extraction or transcription, as it provides temporal context and allows for segment-specific analysis.

Segmentation Strategies

Cloudglue offers two primary segmentation strategies, each optimized for different use cases:

Uniform Segmentation

Uniform segmentation divides video content into fixed-duration segments with consistent timing intervals. This approach is ideal when you need predictable, evenly-spaced segments regardless of the video’s content. Key Parameters:

Window Seconds: The duration of each segment (e.g., 30 seconds)
Hop Seconds: The interval between segment starts, enabling overlapping segments if desired

Use Cases:

Regular Content Analysis: Perfect for systematic analysis where you need consistent time intervals
Performance Optimization: Uniform segments enable predictable processing loads and parallel execution
Time-based Indexing: Ideal for applications that reference content by specific time intervals
Overlapping Analysis: When hop_seconds < window_seconds, you get overlapping segments for comprehensive coverage

Example Configuration:

{
  "strategy": "uniform",
  "uniform_config": {
    "window_seconds": 30,
    "hop_seconds": 20
  }
}

This creates 30-second segments that start every 20 seconds, resulting in 10 seconds of overlap between consecutive segments.

Shot Detection Segmentation

Shot detection segmentation uses computer vision to identify natural scene changes and transitions in video content. This approach creates segments that align with the video’s actual content structure. Key Parameters:

Detector Strategy: Choose from adaptive or content-based detection
Threshold: Detection sensitivity (strategy-specific, lower values = more sensitive)
Min/Max Seconds: Constraints on segment duration to prevent overly short or long segments

Detector Strategies:

Adaptive Detector: Designed for dynamic footage with camera movement, panning, or action. Adapts to motion patterns to avoid false scene breaks during camera moves or fast action sequences. Examples: sports broadcasts, drone footage, handheld documentaries, action movies, live event recordings.
Content Detector: Optimized for controlled footage with clear visual transitions. Focuses on color and lighting changes to identify clean cuts between distinct scenes or shots. Examples: studio interviews, corporate videos, educational content, product demos, scripted content with traditional editing.

Use Cases:

Content-Aware Processing: Segments align with natural scene boundaries and visual transitions
Narrative Structure: Perfect for videos with distinct scenes, like interviews, presentations, or storytelling
Efficient Processing: Avoids artificial breaks mid-scene, preserving contextual integrity
Dynamic Content: Adapts to the video’s natural rhythm rather than imposing fixed intervals

Example Configurations: Adaptive detector for dynamic footage:

{
  "strategy": "shot-detector",
  "shot_detector_config": {
    "detector": "adaptive",
    "threshold": 3.2,
    "min_seconds": 5,
    "max_seconds": 60
  }
}

Content detector for static footage:

{
  "strategy": "shot-detector",
  "shot_detector_config": {
    "detector": "content",
    "threshold": 27.5,
    "min_seconds": 3,
    "max_seconds": 45
  }
}

Temporal Constraints

Both segmentation strategies support optional temporal constraints to focus processing on specific portions of your video:

Start Time: Begin segmentation at a specific timestamp (useful for skipping intros or irrelevant content)
End Time: Stop segmentation at a specific timestamp (useful for excluding outros or credits)

Example:

{
  "strategy": "uniform",
  "uniform_config": {
    "window_seconds": 20
  },
  "start_time_seconds": 60,
  "end_time_seconds": 300
}

This segments only the portion from 1 minute to 5 minutes of the video.

Integration with Other Operations

Segmentation becomes particularly powerful when combined with other Cloudglue operations:

With Entity Extraction

Segmentation enables segment-level entity extraction, where you can identify when specific entities appear in your video:

{
  "file_id": "your-file-id",
  "segmentation_config": {
    "strategy": "shot-detector",
    "shot_detector_config": {
      "detector": "content"
    }
  },
  "extract_config": {
    "prompt": "Identify products and brands shown in each scene",
    "entity_schema": {
      "products": [{ "name": "string", "brand": "string" }]
    },
    "enable_video_level_entities": false,
    "enable_segment_level_entities": true
  }
}

This extracts products and brands for each natural scene in the video, with results tied to specific segments rather than the entire video.

With Transcription

Combine segmentation with transcription to get time-aligned transcripts that correspond to specific video segments:

{
  "file_id": "your-file-id",
  "segmentation_config": {
    "strategy": "uniform",
    "uniform_config": {
      "window_seconds": 30
    }
  },
  "transcribe_config": {
    "speech": true,
    "scene_text": true,
    "visual_scene_description": true
  }
}

This provides rich transcripts for every 30-second segment of your video.

In Collections

Segmentation can be applied at the collection level, providing consistent segmentation across all files in a collection:

{
  "name": "Product Review Analysis",
  "collection_type": "entities",
  "default_segmentation_config": {
    "strategy": "shot-detector",
    "shot_detector_config": {
      "detector": "adaptive",
      "min_seconds": 10,
      "max_seconds": 45
    }
  }
}

All files added to this collection will automatically use shot detection segmentation.

Working with Segments

Once segmentation is complete, you can:

Retrieve Segment Data: Get detailed information about each segment, including timestamps and any processed content
Reference Specific Segments: Use segment IDs to reference specific portions of your video in other operations
Analyze Segment-Level Results: Process results that are tied to specific temporal locations in your video

Choosing the Right Strategy

Use Uniform Segmentation when:

You need consistent, predictable segment durations
Processing performance and parallelization are priorities
Content structure is less important than temporal consistency
You’re building time-based indexing or search functionality

Use Shot Detection when:

Content structure and natural boundaries are important
You’re working with narrative content (interviews, presentations, stories)
You want segments that respect visual transitions and scene changes
Processing efficiency matters more than segment duration predictability

Choose Adaptive Detector for:

Sports footage, action videos, or handheld camera work
Content where the camera frequently moves, pans, or zooms
Videos with fast motion that might create unwanted scene breaks
Documentary-style footage with natural camera movement

Choose Content Detector for:

Professional interviews, presentations, or studio recordings
Content with clear, intentional cuts between different scenes
Static camera setups with controlled lighting
Educational videos or webinars with minimal camera movement

Next Steps

To start using segmentation in your applications:

Create Segments: Use the File Segmentation API to segment individual files
Retrieve Segments: Use the Get Segmentation API to access segment data
Combine with Processing: Include segmentation_config in your extraction or transcription operations
Collection-Level Segmentation: Set default_segmentation_config when creating collections

For detailed implementation examples and advanced use cases, explore our API reference and consider how segmentation can enhance your specific video processing workflows.

Getting Started

Core Concepts

Deep Dives

Data Connectors

Use Cases

What is Segmentation?

Segmentation Strategies

Uniform Segmentation

Shot Detection Segmentation

Temporal Constraints

Integration with Other Operations

With Entity Extraction

With Transcription

In Collections

Working with Segments

Choosing the Right Strategy

Next Steps

Getting Started

Core Concepts

Deep Dives

Data Connectors

Use Cases

​What is Segmentation?

​Segmentation Strategies

​Uniform Segmentation

​Shot Detection Segmentation

​Temporal Constraints

​Integration with Other Operations

​With Entity Extraction

​With Transcription

​In Collections

​Working with Segments

​Choosing the Right Strategy

​Next Steps

What is Segmentation?

Segmentation Strategies

Uniform Segmentation

Shot Detection Segmentation

Temporal Constraints

Integration with Other Operations

With Entity Extraction

With Transcription

In Collections

Working with Segments

Choosing the Right Strategy

Next Steps