Extract Structured Data from Videos
Learn how to extract and structure data from your video content
Extracting Structured Data from Cooking Videos: A Hands-On Guide
Introduction
Video content contains a wealth of information that, when properly structured, can power advanced search, analytics, and insights. In this guide, we’ll walk through extracting structured data from cooking videos using Cloudglue’s entity extraction capabilities.
By the end, you’ll learn how to:
- Create an entity collection for cooking videos
- Define a structured schema for recipe information
- Extract detailed data from YouTube cooking videos
- Analyze and visualize patterns across multiple videos
Prerequisites
Creating a Recipe Entity Collection
First, we’ll create a collection specifically for cooking videos with a schema that captures recipe details, equipment, actions, ingredients, and cooking phases.
Adding YouTube Cooking Videos
Now, let’s add 10 different cooking videos to our collection. We’ll choose a variety of cuisines and meal types.
Waiting for Processing and Retrieving Entities
The extraction process runs asynchronously. Let’s monitor and wait for the extraction to complete.
Extracting and Aggregating the Data
Now that our videos are processed, let’s extract the entities and organize them for analysis.
Analyzing the Data
1. Ingredient Count Comparison by Cuisine Type
Let’s compare the average number of ingredients used in each cuisine type.
2. Phase Duration Analysis
Let’s analyze how much time each video spends in different cooking phases.
This visualization reveals how different cooking videos distribute their time across various cooking phases. We can observe that:
- Active cooking takes up a significant portion (17-38%) of the videos
- Videos vary considerably in how they balance explanation (16-21%) and prep (17-33%)
- Some videos allocate a small portion (around 8%) to tasting phase
- The distribution of phases can indicate the style and target audience of a cooking video:
- Videos with more prep time may be better for beginners
- Videos with more active cooking focus on the technical aspects
- Videos with more explanation time provide more context and background
This type of analysis helps content creators understand the structure of successful cooking videos and allows viewers to find videos that match their preferred learning style.
3. Action Complexity Timeline for a Single Video
Let’s select one representative video and chart how the complexity (measured by actions per segment) changes over time.
This visualization shows how cooking action complexity changes throughout a single video. By mapping the number of distinct cooking actions performed in each segment, we can identify:
- The peak complexity moments in the video (occurring at specific times with higher numbers of distinct actions)
- How cooking phases relate to action complexity (with notable activity in both prep and active cooking phases)
- The rhythm of the recipe - showing multiple spikes of 4-5 actions throughout the video interspersed with less complex segments
- Potential points where viewers might need to pause or rewatch to follow along
In the example above, we can see that this video has several periods of moderate complexity with 3-4 actions, with the peak complexity reaching 7 distinct cooking actions during the prep phase. This kind of insight can help content creators design more balanced cooking videos or add additional explanations at high-complexity points.
Exploring Advanced Queries
Besides the visualizations above, you can perform more targeted analyses with simple pandas queries:
Conclusion
In this guide, we’ve demonstrated how to:
- Create a structured extraction schema for cooking videos
- Process multiple YouTube videos with Cloudglue’s entity extraction
- Analyze the extracted data to uncover insights about ingredients, cooking phases, and action complexity
- Visualize the results using matplotlib
The structured data extraction capabilities of Cloudglue make it possible to transform unstructured video content into actionable insights. This approach can be extended to any domain where videos contain valuable information that needs to be structured for analysis.
Next Steps
- Try different extraction schemas for other video domains (e.g., sports analysis, product reviews)
- Implement real-time dashboards using extracted data
- Build a search interface that lets users find specific segments across videos
- Create a recommendation system based on extracted entities