A comprehensive guide to extracting data with Cloudglue
Video contains a wealth of information, but developers often need this information in a structured format that their applications can easily consume. While transcription gives you raw information from a video, entity extraction allows you to get precisely the structured data you need.
With Cloudglue’s Extract API, you can define exactly what information you want to extract and receive it in a format that’s ready for your database or application logic.
Cloudglue allows you to extract entities from both locally uploaded files and YouTube videos.
Entities are structured pieces of information that can be extracted from videos. They come in two forms:
For example, if you’re analyzing product review videos, you might want to extract:
While transcription gives you comprehensive raw data from a video, entity extraction is better when you:
An entity schema defines what information you want to extract. You can specify it in two ways:
We provide a graphical tool to help you build and test your schemas:
For this example, we’ll analyze a political speech about tariffs. While the video below shows the source content, our analysis was performed using a local copy to enable full multimodal understanding:
Let’s look at a real-world example of extracting structured information from a political speech video. This example demonstrates how to combine schema definition with a clear prompt to get both video-level and segment-level information.
The extraction produces both video-level entities and time-segmented entities:
Video-Level Entities
Segment-Level Entities (20-40s segment)
This example demonstrates how Cloudglue can:
For quick experiments or single extractions, use the direct extract endpoint:
For production use cases, create an entities collection to process multiple videos consistently:
Focus on Observable Attributes: Design your schema around information that can be:
Avoid requesting:
Be Specific: Only extract the information you actually need. More fields aren’t always better.
Use Prompts Effectively: Combine schemas with prompts to guide the extraction:
Test Diverse Content: Use the Extract Playground to test your schema against different types of videos.
Start Simple: Begin with a minimal schema and expand based on needs.
Consider Segments: Decide if you need information at the video level, segment level, or both.
Structure Based on Occurrence: Choose the right structure based on how entities appear in scenes:
[]
) for entities that can appear multiple times in a scene:
{}
) for attributes that typically appear once per scene:
Use Collections: For production systems, use collections to ensure consistent extraction across multiple videos.
Check out our Extract Endpoint to get started with entity extraction. Get started on our platform.
At the moment, if you want to extract entities from a YouTube video directly, we only support extracting from speech content. For full multimodal entity extraction, download the video and upload it to Cloudglue.
When working with YouTube videos, consider adjusting your schema and prompt to focus on audio-centric information. Visual elements like camera_shots
, backdrop
, or on-screen symbols
won’t be available through direct YouTube processing. For applications requiring comprehensive visual analysis, we recommend downloading the video first and using our Files API for complete multimodal understanding.
A comprehensive guide to extracting data with Cloudglue
Video contains a wealth of information, but developers often need this information in a structured format that their applications can easily consume. While transcription gives you raw information from a video, entity extraction allows you to get precisely the structured data you need.
With Cloudglue’s Extract API, you can define exactly what information you want to extract and receive it in a format that’s ready for your database or application logic.
Cloudglue allows you to extract entities from both locally uploaded files and YouTube videos.
Entities are structured pieces of information that can be extracted from videos. They come in two forms:
For example, if you’re analyzing product review videos, you might want to extract:
While transcription gives you comprehensive raw data from a video, entity extraction is better when you:
An entity schema defines what information you want to extract. You can specify it in two ways:
We provide a graphical tool to help you build and test your schemas:
For this example, we’ll analyze a political speech about tariffs. While the video below shows the source content, our analysis was performed using a local copy to enable full multimodal understanding:
Let’s look at a real-world example of extracting structured information from a political speech video. This example demonstrates how to combine schema definition with a clear prompt to get both video-level and segment-level information.
The extraction produces both video-level entities and time-segmented entities:
Video-Level Entities
Segment-Level Entities (20-40s segment)
This example demonstrates how Cloudglue can:
For quick experiments or single extractions, use the direct extract endpoint:
For production use cases, create an entities collection to process multiple videos consistently:
Focus on Observable Attributes: Design your schema around information that can be:
Avoid requesting:
Be Specific: Only extract the information you actually need. More fields aren’t always better.
Use Prompts Effectively: Combine schemas with prompts to guide the extraction:
Test Diverse Content: Use the Extract Playground to test your schema against different types of videos.
Start Simple: Begin with a minimal schema and expand based on needs.
Consider Segments: Decide if you need information at the video level, segment level, or both.
Structure Based on Occurrence: Choose the right structure based on how entities appear in scenes:
[]
) for entities that can appear multiple times in a scene:
{}
) for attributes that typically appear once per scene:
Use Collections: For production systems, use collections to ensure consistent extraction across multiple videos.
Check out our Extract Endpoint to get started with entity extraction. Get started on our platform.
At the moment, if you want to extract entities from a YouTube video directly, we only support extracting from speech content. For full multimodal entity extraction, download the video and upload it to Cloudglue.
When working with YouTube videos, consider adjusting your schema and prompt to focus on audio-centric information. Visual elements like camera_shots
, backdrop
, or on-screen symbols
won’t be available through direct YouTube processing. For applications requiring comprehensive visual analysis, we recommend downloading the video first and using our Files API for complete multimodal understanding.