A comprehensive guide to extracting data with Cloudglue
Video contains a wealth of information, but developers often need this information in a structured format that their applications can easily consume. While transcription gives you raw information from a video, entity extraction allows you to get precisely the structured data you need.With Cloudglue’s Extract API, you can define exactly what information you want to extract and receive it in a format that’s ready for your database or application logic.
Cloudglue allows you to extract entities from both locally uploaded files and
YouTube videos.
For this example, we’ll analyze a political speech about tariffs. While the video below shows the source content, our analysis was performed using a local copy to enable full multimodal understanding:
Let’s look at a real-world example of extracting structured information from a political speech video. This example demonstrates how to combine schema definition with a clear prompt to get both video-level and segment-level information.
Extract the following structured information from C-SPAN videos:1. SPEAKER: Identify main speakers by name and title2. DISCOURSE: Determine the main topic, extract notable key phrases, identify rhetorical techniques with examples, and document stated policy positions.3. REFERENCES: Record any executive orders, legislation (with names/numbers), or agreements mentioned.4. VISUAL: Note on-screen text (chyrons), backdrop elements, types of camera shots used, and significant visual symbols.
For quick experiments or single extractions, use the direct extract endpoint:
JavaScript SDK
Python SDK
Copy
Ask AI
// Define your schemaconst schema = { products: [ { name: "string", price: "string", rating: "string" } ]};// Create an extract jobconst extractJob = await client.extract.createExtract(fileUri, {schema: schema,// Optionally include a prompt to guide the extractionprompt: "Extract product details including exact prices and ratings"});// Get the resultsconst result = await client.extract.getExtract(extractJob.job_id);console.log(result.data);
Focus on Observable Attributes: Design your schema around information that can be:
Visually seen in the video
Read from on-screen text
Heard in speech or narration
Understood from actions and events
Avoid requesting:
Technical metadata (e.g., bitrate, duration)
Highly subjective interpretations
Information that requires external context
Be Specific: Only extract the information you actually need. More fields aren’t always better.
Use Prompts Effectively: Combine schemas with prompts to guide the extraction:
Copy
Ask AI
// JavaScript SDKconst extractJob = await client.extract.createExtract(fileUri, { schema: productSchema, prompt: "Focus on extracting exact prices in USD and ratings out of 5 stars that are explicitly shown or mentioned"});
Copy
Ask AI
# Python SDKextract_job = client.extract.create( url=file_uri, schema=product_schema, prompt="Focus on extracting exact prices in USD and ratings out of 5 stars that are explicitly shown or mentioned")
Test Diverse Content: Use the Extract Playground to test your schema against different types of videos.
Start Simple: Begin with a minimal schema and expand based on needs.
Consider Segments: Decide if you need information at the video level, segment level, or both.
Structure Based on Occurrence: Choose the right structure based on how entities appear in scenes:
Use lists ([]) for entities that can appear multiple times in a scene:
At the moment, if you want to extract entities from a YouTube video directly,
we only support extracting from speech content. For full multimodal entity
extraction, download the video and upload it to Cloudglue.
Copy
Ask AI
// Extract entities from YouTube video speechconst result = await client.extract.createExtract( 'https://www.youtube.com/watch?v=VIDEO_ID', { schema: mySchema, prompt: "Extract key information from the speaker's content" });
When working with YouTube videos, consider adjusting your schema and prompt to focus on audio-centric information. Visual elements like camera_shots, backdrop, or on-screen symbols won’t be available through direct YouTube processing. For applications requiring comprehensive visual analysis, we recommend downloading the video first and using our Files API for complete multimodal understanding.