What is Segmentation?
Segmentation is the process of dividing video content into smaller, meaningful segments. Think of it as creating chapters or scenes within your video that can be processed, analyzed, and referenced independently. Each segment represents a distinct portion of your video with specific start and end timestamps. Unlike processing an entire video as one unit, segmentation allows you to work with granular pieces of content. For example, in a 30-minute cooking video, you might have segments for ingredient preparation (0-5 minutes), cooking process (5-20 minutes), and final presentation (20-30 minutes). Each segment can then be transcribed, analyzed for entities, or processed independently. Segmentation is particularly powerful when combined with other Cloudglue operations like entity extraction or description, as it provides temporal context and allows for segment-specific analysis.Segmentation Strategies
Cloudglue offers four primary segmentation strategies, each optimized for different use cases:Uniform Segmentation
Uniform segmentation divides video content into fixed-duration segments with consistent timing intervals. This approach is ideal when you need predictable, evenly-spaced segments regardless of the video’s content. Key Parameters:- Window Seconds: The duration of each segment (e.g., 30 seconds)
- Hop Seconds: The interval between segment starts, enabling overlapping segments if desired
- Regular Content Analysis: Perfect for systematic analysis where you need consistent time intervals
- Performance Optimization: Uniform segments enable predictable processing loads and parallel execution
- Time-based Indexing: Ideal for applications that reference content by specific time intervals
- Overlapping Analysis: When hop_seconds < window_seconds, you get overlapping segments for comprehensive coverage
Shot Detection Segmentation
Shot detection segmentation uses computer vision to identify natural scene changes and transitions in video content. This approach creates segments that align with the video’s actual content structure. Key Parameters:- Detector Strategy: Choose from adaptive or content-based detection
- Threshold: Detection sensitivity (strategy-specific, lower values = more sensitive)
- Min/Max Seconds: Constraints on segment duration to prevent overly short or long segments (defaults: 1 second minimum, 300 seconds maximum)
- Fill Gaps: When
true(the default), gaps between detected shots are filled to ensure complete timeline coverage. Gaps ≥min_secondsbecome their own segments (split bymax_secondsif needed), and shorter gaps are merged into the nearest adjacent segment. Set tofalseto preserve only the raw detected shot boundaries.
- Adaptive Detector: Designed for dynamic footage with camera movement, panning, or action. Adapts to motion patterns to avoid false scene breaks during camera moves or fast action sequences. Examples: sports broadcasts, drone footage, handheld documentaries, action movies, live event recordings.
- Content Detector: Optimized for controlled footage with clear visual transitions. Focuses on color and lighting changes to identify clean cuts between distinct scenes or shots. Examples: studio interviews, corporate videos, educational content, product demos, scripted content with traditional editing.
- Content-Aware Processing: Segments align with natural scene boundaries and visual transitions
- Narrative Structure: Perfect for videos with distinct scenes, like interviews, presentations, or storytelling
- Efficient Processing: Avoids artificial breaks mid-scene, preserving contextual integrity
- Dynamic Content: Adapts to the video’s natural rhythm rather than imposing fixed intervals
max_seconds as the window size if specified, otherwise defaults to 20-second segments. This ensures you always receive usable segments rather than an error.
Manual Segmentation
Manual segmentation is a strategy that allows you to manually specify the segments of your video. This is useful when you want to segment your video into specific segments based on your own criteria. Key Parameters:- Segments: An array of segments, each with a start and end time
Narrative Segmentation
Narrative segmentation uses AI to identify logical chapter boundaries in your video based on content flow and topic transitions. This approach creates segments that align with the natural narrative structure of your content, similar to how a human would identify chapters in a book or documentary. Key Parameters:- Strategy: Choose between
comprehensive(deeper analysis using vision language models) orbalanced(faster analysis using multiple modalities) - Prompt: Optional custom instructions to guide the AI’s chapter identification
- Number of Chapters: Target number of chapters (optional - AI calculates based on duration if not specified)
- Min/Max Chapters: Constraints on the number of chapters to generate
- Comprehensive: Uses a vision language model (VLM) for deep analysis of video content. Best for complex videos where visual context is critical for understanding chapter boundaries. Only available for non-YouTube videos.
- Balanced: Uses a multi-modal approach combining transcript and visual analysis. Faster and more cost-effective while still providing intelligent chapter detection.
- Content Chapters: Perfect for educational videos, tutorials, or documentaries where you want meaningful chapter markers
- Topic-Based Segmentation: Ideal for podcasts, interviews, or presentations that cover multiple topics
- Contextual Processing: When you need segments that preserve semantic context rather than arbitrary time-based divisions
- User Navigation: Create chapter markers that help users navigate to specific topics in your video
Narrative segmentation is not supported for YouTube URLs. Use the Segments
API directly for YouTube video chapter
detection.
Temporal Constraints
All four segmentation strategies (uniform, shot-detector, manual, and narrative) support optional temporal constraints to focus processing on specific portions of your video:- Start Time: Begin segmentation at a specific timestamp (useful for skipping intros or irrelevant content)
- End Time: Stop segmentation at a specific timestamp (useful for excluding outros or credits)
Integration with Other Operations
Segmentation becomes particularly powerful when combined with other Cloudglue operations:With Entity Extraction
Segmentation enables segment-level entity extraction, where you can identify when specific entities appear in your video:With Media Description
Combine segmentation with media descriptions to get time-aligned descriptions that correspond to specific video segments:In Collections
Segmentation can be applied at the collection level, providing consistent segmentation across all files in a collection:Working with Segments
Once segmentation is complete, you can:- Retrieve Segment Data: Get detailed information about each segment, including timestamps and any processed content
- Reference Specific Segments: Use segment IDs to reference specific portions of your video in other operations
- Analyze Segment-Level Results: Process results that are tied to specific temporal locations in your video
Choosing the Right Strategy
Use Uniform Segmentation when:- You need consistent, predictable segment durations
- Processing performance and parallelization are priorities
- Content structure is less important than temporal consistency
- You’re building time-based indexing or search functionality
- Content structure and natural boundaries are important
- You’re working with visual content where scene changes matter (films, commercials)
- You want segments that respect visual transitions and scene changes
- Processing efficiency matters more than segment duration predictability
- You need intelligent chapter detection based on content meaning
- Working with educational content, tutorials, or presentations
- Semantic topic boundaries are more important than visual transitions
- You want AI-powered chapter markers for user navigation
- Processing podcasts, interviews, or discussions with multiple topics
- Sports footage, action videos, or handheld camera work
- Content where the camera frequently moves, pans, or zooms
- Videos with fast motion that might create unwanted scene breaks
- Documentary-style footage with natural camera movement
- Professional interviews, presentations, or studio recordings
- Content with clear, intentional cuts between different scenes
- Static camera setups with controlled lighting
- Educational videos or webinars with minimal camera movement
Next Steps
To start using segmentation in your applications:- Create Segments: Use the File Segmentation API to segment individual files
- Retrieve Segments: Use the Get Segmentation API to access segment data
- Combine with Processing: Include
segmentation_configin your extraction or describe operations - Collection-Level Segmentation: Set
default_segmentation_configwhen creating collections