Introduction - Cloudglue

What is Cloudglue?

Cloudglue is the video context layer for agents and applications. Cloudglue lets developers search, ask questions, analyze, and extract structured data within individual video, and across thousands of hours of video as a unified queryable corpus. Designed for simplicity, scale, and fidelity, Cloudglue unifies speech, diarization, visual understanding, sound, and on-screen text into simple, composable APIs—so you can enable video Q&A, semantic search, and structured data extraction with full citations in just a few lines of code — without building your own video-understanding stack. Whether you’re developing AI agent workflows, creative tools, or analyzing meeting recordings, Cloudglue makes video queryable and actionable for any AI system. Cloudglue handles the infrastructure so you can focus on building features that matter to your users.

Get Started in 3 Steps

Get your API key

Install an SDK, or use MCP or Playground

Python SDK

Install our Python SDK

JavaScript SDK

Install our JavaScript SDK

MCP Server

Use the MCP Server for Claude, Cursor, or Windsurf integration.

Playground

Test directly in Playground.

Upload video and extract

Upload your first video, then extract structured data or chat across your videos in minutes.

Core Features

Video Document Parsing

Foundational APIs that transform unstructured video and audio into structured, queryable context.

Describe

Get a comprehensive moment-by-moment description on a video, including transcript, diarization, visual descriptions, audio desecriptions, sound, on-screen text, and more. Perfect for getting every detail on a video.

Extract

Extract structured data from videos at scale, across modalities, using a prompt or custom schema. Making videos easy to program against, query against, and categorize in your application.

Segment

Split videos into meaningful parts with segmentation options like intelligent shot detection, and narrative (chapters). Turning videos into logical sequences.

Video Reasoning

Higher-level APIs that enable multimodal search, chat, and reasoning directly over video content.

Search

Add semantic search over videos and segments with natural-language queries. Enable this in your application with just a few lines of code.

Chat Completion

Add conversational AI that can query, compare, and reason across hundreds of videos, complete with full citations, with just a few lines of code.

Responses API

Next-gen conversational API with streaming, entity-backed knowledge, multi-turn support, and background processing.

What Makes Cloudglue Different

Multimodal AI: We don’t just transcribe speech — we understand across context including visual content, audio descriptions, on-screen text, and diarization.
- Prefer speech-only? You can disable multimodality and use transcripts alone.
Scale: Built to handle hour-long (or longer) videos, and reason across hundreds or even thousands of videos at once, all using the same simple primitives.
Developer-First: Clean APIs, comprehensive SDKs, and tools built for developers.
Robust: Designed for production workloads with reliable performance across large video datasets.
Real-time Integration: Rich partner ecosystem for building integrations, including MCP server support for direct AI assistant integration.
Backed by Research: Cloudglue continuously integrates the latest advancements in multimodal AI—so as foundational models improve, your application also get the latest. Our infrastructure is built and maintained by a team that actively publishes research in large-scale video and audio understanding.

Quick Example

Here’s how easy it is to extract structured data from any video:

from cloudglue import CloudGlue

client = CloudGlue()

# Upload and extract

uploaded = client.files.upload(
'path/to/local/video.mp4',
wait_until_finish=True
)

extraction = client.extract.run(
url=uploaded.uri,
prompt="Extract all speakers and main topics discussed",
schema={"speakers": ["string"], "topics": ["string"]}
)

print(extraction.data)

# {"speakers": ["John Smith", "Sarah Johnson"], "topics": ["AI", "Marketing"]}

Popular Use Cases

Video Q&A Chatbots

Build intelligent chatbots that can answer questions about video content, perfect for training materials, meetings, and educational content.

Structured Data Extraction

Extract specific information like product details, people, locations, or any custom data schema from video content at scale.

Video Knowledge Bases

Create searchable knowledge bases on video recordings, making hours of content instantly accessible and queryable.

Capabilities at a Glance

Collections & Organization

Entity Collections - Process multiple videos with consistent schemas
Media Description Collections - Organize videos for searchable multimodal transcriptions
Collection Chat - Have conversations across entire video libraries

Integrations & Tools

MCP Server - Direct integration with Claude Desktop and Cursor
Playground - Test and experiment with video processing
Schema Builder - Visual tool for creating extraction schemas
Webhooks - Real-time processing notifications

Multimodal Understanding

Speech Transcription - Accurate speech-to-text with speaker identification
Visual Scene Analysis - Detailed descriptions of what’s happening visually
Scene Text Recognition - Extract text visible on screen (captions, presentations, etc.)
Media Integrations - Process videos directly from YouTube, TikTok, and Loom URLs
Audio Description - Extract audio descriptions from video content
Face Detection and Matching - Find videos with a given face

Next Steps

Choose your path based on what you want to accomplish:

New to Cloudglue?

Start with our setup guide to get your API key and SDK installed.

See Examples

Explore detailed use cases with step-by-step implementations.

API Reference

Dive into the full API documentation and endpoint details.

Try the Playground

Test video processing directly in your browser without any code.

Resources

API Documentation: Full API Reference
SDKs: JavaScript • Python
Tools: Playground • Schema Builder
Need an SDK or integration? Let us know!

​What is Cloudglue?

​Get Started in 3 Steps