> ## Documentation Index > Fetch the complete documentation index at: https://docs.cloudglue.dev/llms.txt > Use this file to discover all available pages before exploring further. # Introduction > Turn video data into a reusable, multimodal context layer for AI agents & applications. ## What is Cloudglue? [**Cloudglue**](https://cloudglue.dev?ref=docs) is the video context layer for agents and applications. Cloudglue lets developers search, ask questions, analyze, and extract structured data within individual video, and across thousands of hours of video as a unified queryable corpus. Designed for simplicity, scale, and fidelity, Cloudglue unifies speech, diarization, visual understanding, sound, and on-screen text into simple, composable APIs—so you can enable **video Q\&A**, **semantic search**, and **structured data extraction with full citations** in just a few lines of code — without building your own video-understanding stack. Whether you’re developing **AI agent workflows**, **creative tools**, or **analyzing meeting recordings**, Cloudglue makes video queryable and actionable for any AI system. Cloudglue handles the infrastructure so you can focus on building features that matter to your users. ## Get Started in 3 Steps [Sign up for free](https://app.cloudglue.dev/auth/sign-up) and get your API key from the dashboard. Install our Python SDK Install our JavaScript SDK Use the MCP Server for Claude, Cursor, or Windsurf integration. Test directly in [Playground](https://app.cloudglue.dev/home/playground). Upload your first video, then extract structured data or chat across your videos in minutes. ## Core Features ### Video Document Parsing Foundational APIs that transform unstructured video and audio into structured, queryable context. Get a comprehensive moment-by-moment description on a video, including transcript, diarization, visual descriptions, audio desecriptions, sound, on-screen text, and more. Perfect for getting every detail on a video. Extract structured data from videos at scale, across modalities, using a prompt or custom schema. Making videos easy to program against, query against, and categorize in your application. Split videos into meaningful parts with segmentation options like intelligent shot detection, and narrative (chapters). Turning videos into logical sequences. ### Video Reasoning Higher-level APIs that enable multimodal search, chat, and reasoning directly over video content. Add semantic search over videos and segments with natural-language queries. Enable this in your application with just a few lines of code. Add conversational AI that can query, compare, and reason across hundreds of videos, complete with full citations, with just a few lines of code. Next-gen conversational API with streaming, entity-backed knowledge, multi-turn support, and background processing. ## What Makes Cloudglue Different * **Multimodal AI**: We don't just transcribe speech — we understand across context including visual content, audio descriptions, on-screen text, and diarization. * *Prefer speech-only?* You can disable multimodality and use transcripts alone. * **Scale**: Built to handle hour-long (or longer) videos, and reason across hundreds or even thousands of videos at once, all using the same simple primitives. * **Developer-First**: Clean APIs, comprehensive SDKs, and tools built for developers. * **Robust**: Designed for production workloads with reliable performance across large video datasets. * **Real-time Integration**: Rich partner ecosystem for building integrations, including MCP server support for direct AI assistant integration. * **Backed by Research:** Cloudglue continuously integrates the latest advancements in multimodal AI—so as foundational models improve, your application also get the latest. Our infrastructure is built and maintained by a team that actively publishes research in large-scale video and audio understanding. ## Quick Example Here's how easy it is to extract structured data from any video: ```python Python theme={null} from cloudglue import CloudGlue client = CloudGlue() # Upload and extract uploaded = client.files.upload( 'path/to/local/video.mp4', wait_until_finish=True ) extraction = client.extract.run( url=uploaded.uri, prompt="Extract all speakers and main topics discussed", schema={"speakers": ["string"], "topics": ["string"]} ) print(extraction.data) # {"speakers": ["John Smith", "Sarah Johnson"], "topics": ["AI", "Marketing"]} ``` ```typescript JavaScript theme={null} /// import { CloudGlue } from '@aviaryhq/cloudglue-js'; import * as fs from 'fs'; import * as path from 'path'; const client = new CloudGlue(); const filePath = 'path/to/video.mp4'; const fileBuffer = await fs.promises.readFile(filePath); const file = new File([fileBuffer], path.basename(filePath)); const uploadResult = await client.files.uploadFile({ file }); const fileDetails = await client.files.waitForReady(uploadResult.id); const extraction = await client.extract.createExtract(fileDetails.uri, { prompt: 'Extract all speakers and main topics discussed', schema: { speakers: ['string'], topics: ['string'] }, }); const extractionResult = await client.extract.waitForReady(extraction.job_id); console.log(extractionResult.data); ``` ## Popular Use Cases Build intelligent chatbots that can answer questions about video content, perfect for training materials, meetings, and educational content. Extract specific information like product details, people, locations, or any custom data schema from video content at scale. Create searchable knowledge bases on video recordings, making hours of content instantly accessible and queryable. ## Capabilities at a Glance ### Collections & Organization * [**Entity Collections**](/core-concepts/entity-collection) - Process multiple videos with consistent schemas * [**Media Description Collections**](/core-concepts/media-description-collection) - Organize videos for searchable multimodal transcriptions * [**Collection Chat**](/core-concepts/chat-completions) - Have conversations across entire video libraries ### Integrations & Tools * [**MCP Server**](/getting-started/mcp-server) - Direct integration with Claude Desktop and Cursor * [**Playground**](https://app.cloudglue.dev/home/playground) - Test and experiment with video processing * [**Schema Builder**](https://app.cloudglue.dev/tools/extract-schema-helper) - Visual tool for creating extraction schemas * [**Webhooks**](/getting-started/webhooks) - Real-time processing notifications ### Multimodal Understanding * **Speech Transcription** - Accurate speech-to-text with speaker identification * **Visual Scene Analysis** - Detailed descriptions of what's happening visually * **Scene Text Recognition** - Extract text visible on screen (captions, presentations, etc.) * **Media Integrations** - Process videos directly from YouTube, TikTok, and Loom URLs * **Audio Description** - Extract audio descriptions from video content * **Face Detection and Matching** - Find videos with a given face ## Next Steps Choose your path based on what you want to accomplish: Start with our setup guide to get your API key and SDK installed. Explore detailed use cases with step-by-step implementations. Dive into the full API documentation and endpoint details. Test video processing directly in your browser without any code. ## Resources * **API Documentation**: [Full API Reference](https://docs.cloudglue.dev/api-reference) * **SDKs**: [JavaScript](https://docs.cloudglue.dev/sdks/javascript) • [Python](https://docs.cloudglue.dev/sdks/python) * **Tools**: [Playground](https://app.cloudglue.dev/home/playground) • [Schema Builder](https://app.cloudglue.dev/tools/extract-schema-helper) * **Need an SDK or integration?** [Let us know!](mailto:hello@cloudglue.dev)