Skip to main content

What is Cloudglue?

Cloudglue is the video context layer for agents and applications. Cloudglue lets developers search, ask questions, analyze, and extract structured data within individual video, and across thousands of hours of video as a unified queryable corpus. Designed for simplicity, scale, and fidelity, Cloudglue unifies speech, diarization, visual understanding, sound, and on-screen text into simple, composable APIs—so you can enable video Q&A, semantic search, and structured data extraction with full citations in just a few lines of code — without building your own video-understanding stack. Whether you’re developing AI agent workflows, creative tools, or analyzing meeting recordings, Cloudglue makes video queryable and actionable for any AI system. Cloudglue handles the infrastructure so you can focus on building features that matter to your users.

Get Started in 3 Steps

1

Get your API key

Sign up for free and get your API key from the dashboard.
2

Install an SDK, or use MCP or Playground

Python SDK

Install our Python SDK

JavaScript SDK

Install our JavaScript SDK

MCP Server

Use the MCP Server for Claude, Cursor, or Windsurf integration.

Playground

Test directly in Playground.
3

Upload video and extract

Upload your first video, then extract structured data or chat across your videos in minutes.

Core Features

Video Document Parsing

Foundational APIs that transform unstructured video and audio into structured, queryable context.

Describe

Get a comprehensive moment-by-moment description on a video, including transcript, diarization, visual descriptions, audio desecriptions, sound, on-screen text, and more. Perfect for getting every detail on a video.

Extract

Extract structured data from videos at scale, across modalities, using a prompt or custom schema. Making videos easy to program against, query against, and categorize in your application.

Segment

Split videos into meaningful parts with segmentation options like intelligent shot detection, and narrative (chapters). Turning videos into logical sequences.

Video Reasoning

Higher-level APIs that enable multimodal search, chat, and reasoning directly over video content.

Search

Add semantic search over videos and segments with natural-language queries. Enable this in your application with just a few lines of code.

Chat Completion

Add conversational AI that can query, compare, and reason across hundreds of videos, complete with full citations, with just a few lines of code.

What Makes Cloudglue Different

  • Multimodal AI: We don’t just transcribe speech — we understand across context including visual content, audio descriptions, on-screen text, and diarization.
    • Prefer speech-only? You can disable multimodality and use transcripts alone.
  • Scale: Built to handle hour-long (or longer) videos, and reason across hundreds or even thousands of videos at once, all using the same simple primitives.
  • Developer-First: Clean APIs, comprehensive SDKs, and tools built for developers.
  • Robust: Designed for production workloads with reliable performance across large video datasets.
  • Real-time Integration: Rich partner ecosystem for building integrations, including MCP server support for direct AI assistant integration.
  • Backed by Research: Cloudglue continuously integrates the latest advancements in multimodal AI—so as foundational models improve, your application also get the latest. Our infrastructure is built and maintained by a team that actively publishes research in large-scale video and audio understanding.

Quick Example

Here’s how easy it is to extract structured data from any video:
from cloudglue import CloudGlue

client = CloudGlue()

# Upload and extract

uploaded = client.files.upload(
'path/to/local/video.mp4',
wait_until_finish=True
)

extraction = client.extract.run(
url=uploaded.uri,
prompt="Extract all speakers and main topics discussed",
schema={"speakers": ["string"], "topics": ["string"]}
)

print(extraction.data)

# {"speakers": ["John Smith", "Sarah Johnson"], "topics": ["AI", "Marketing"]}

Video Q&A Chatbots

Build intelligent chatbots that can answer questions about video content, perfect for training materials, meetings, and educational content.

Structured Data Extraction

Extract specific information like product details, people, locations, or any custom data schema from video content at scale.

Video Knowledge Bases

Create searchable knowledge bases on video recordings, making hours of content instantly accessible and queryable.

Capabilities at a Glance

Collections & Organization

Integrations & Tools

  • MCP Server - Direct integration with Claude Desktop and Cursor
  • Playground - Test and experiment with video processing
  • Schema Builder - Visual tool for creating extraction schemas
  • Webhooks - Real-time processing notifications

Multimodal Understanding

  • Speech Transcription - Accurate speech-to-text with speaker identification
  • Visual Scene Analysis - Detailed descriptions of what’s happening visually
  • Scene Text Recognition - Extract text visible on screen (captions, presentations, etc.)
  • YouTube Integration - Process videos directly from YouTube URLs
  • Audio Description - Extract audio descriptions from video content
  • Face Detection and Matching - Find videos with a given face

Next Steps

Choose your path based on what you want to accomplish:

New to Cloudglue?

Start with our setup guide to get your API key and SDK installed.

See Examples

Explore detailed use cases with step-by-step implementations.

API Reference

Dive into the full API documentation and endpoint details.

Try the Playground

Test video processing directly in your browser without any code.

Resources