Python

You can find the package on PyPI.

Installation

To install the Cloudglue Python SDK, use pip:

pip install cloudglue

Usage

Get an API key from cloudglue.dev
Set the API key as an environment variable named CLOUDGLUE_API_KEY or pass the api key in as a parameter to the CloudGlue class constructor

Here’s an example of how to create an Cloudglue client:

from cloudglue import CloudGlue

client = CloudGlue(api_key='cg-your-api-key-here')

Working with Files

To use most Cloudglue APIs you’ll need to operate on a file uploaded into Cloudglue. Below are the basics for working with files:

# See your current files
my_files = client.files.list()
print(my_files)

# Upload a file
my_file = client.files.upload(
    file_path="/path/to/your/video.mp4",
    # Optional user defined metadata that can be stored with file
    metadata={
        "subject": ["education", "software"]
    },
    # optionally waiting until finish
    wait_until_finish=True
)

# These identifiers are generated by cloudglue and can be used to reference your video in different APIs
print(my_file.uri)
print(my_file.id)

# You can also check status of file upload (helpful in case that you didn't wait until finish)
file_info = client.files.get(file_id=my_file.id)
print(file_info)

Note: When uploading files, make sure to:

Handle the file reading properly with error checking
Set the correct MIME type for your video file (e.g., ‘video/mp4’, ‘video/quicktime’, etc.)
Include relevant metadata about the upload
Handle potential upload errors appropriately

Generating Transcripts

You can use our Transcribe API to generate multimodal transcripts from videos. Getting started is easy, just specify the video you want to transcribe and the configuration of the transcription.

result = client.transcribe.run(url=my_file.uri, {
  enable_summary:true,
  enable_speech:true,
  enable_scene_text:true,
  enable_visual_scene_description:true,
})

print(result)

YouTube URLs are supported as input to this API.

Note that YouTube videos are currently limited to speech and metadata level understanding, for fully fledged multimodal video understanding please upload a file instead to the Cloudglue Files API and use that input instead.


result = client.transcribe.run(url=my_file.uri, {
enable_summary:true,
enable_speech:true,
enable_scene_text:false,
enable_visual_scene_description:false,
})

print(result)

Extracting Structured Information from Videos

Organizing information in structured entity schemas allow for response types that are easy to program AI applications against. Let’s get started with extracting structured entity information from videos.

Prompt Only Extraction

Below we’ll show how to extract information from a video using natural language to guide entire process. This is particularly helpful during the exploratory phase where your entity structure may not be completely known yet.

# Specify a prompt for what you are interested in extracting
prompt = "Extract the restaurant name and specialty information from this video"

# Kick off a extract job and wait for result
result = client.extract.run(url=my_file.uri, prompt=prompt)
print(result)

# Alternatively you could use the APIs to create the extract job and check on result later
my_job = client.extract.create(url=my_file.uri, prompt=prompt)
result = client.extract.get(job_id=my_job.id)
print(result)

YouTube URLs are supported as input to this API.

# Specify a prompt for what you are interested in extracting
prompt = "Extract the main talking points in this video"

# Extract entities from the spoken content of a YouTube video
result = client.extract.run(url="https://www.youtube.com/shorts/BLAH" , prompt=prompt)
print(result)

Schema Driven Extraction

In Cloudglue you can direct the extraction process to get data in a specific format, which is helpful if your downstream application requires programming against specific fields or storing data in a database with specific structure. We allow users to specify schemas either as an example/abbreviated json object or a fully fledged JSON object specification. For convenience we provide a graphical entity schema builder. In food review videos for example let’s say we really want to know the restaurant name and some review blurb for our table, in which case your entity schema might look something like this

{
  "restaurants": [
    {
      "name": "string",
      "blurb": "blurb by reviewer on why this place matters"
    }
  ]
}

Now lets extract data using this schema

# Schema driven extraction
result = client.extract.run(
    url=my_file.uri,
    # Our restaurant review schema
    schema= {
        "restaurants": [{
            "name": "string",
            "blurb": "blurb by reviewer on why this place matters"
    }]}
    # Optionally you could also specify a prompt relevant for guiding the extraction further
    # prompt=prompt
)
print(result)

Working with Collections

Our abstraction for that is called “Collections”. In a collection not only can you logically store related resources together under a single umbrella, you can also give the platform guidance on the types of information you want described or extracted as entities at rest for later usage. Below are the basics for working with a collection

# See your current collections
my_collections = client.files.list()
print(my_collections)

# Here we specify our collection with a description for future reference, and optionally you can configure the types of entities you want to extract from the videos
my_collection = client.collections.create(
    name="My Must Eats",
    description="All my favorite food videos in one collection",
    collection_type="entities",
    extract_config={
        "schema": {
            "foods": ["string"],
            "cuisines": ["string"],
        }
    }
)
print(my_collection)

# Now that our collection is created we can add files to it
file_info = client.collections.add_video(
    collection_id=my_collection.id,
    file_id=my_file.id,
    wait_until_finish=True
)
print(file_info)

# And we can list the files in the collection
collection_files = client.collections.list_videos(
    collection_id=my_collection.id
)

# Add a YouTube video to the collection
# Note that YouTube processing is limited to speech and to get the full richness of multimodal video understanding upload a video to Files API instead
file_info = client.collections.add_youtube_video(
    collection_id=my_collection.id,
    url="https://www.youtube.com/watch?v=BLAH"
)

Once files are processed into a collection their video entities become available for future reference.

# Now let's view the extracted entities
entities = client.collections.get_video_entities(
    collection_id=my_collection.id,
    file_id=my_file.id
)
print(entities)

Talking with Videos

Similar to how chat gpt or claude operate we expose a chat completion API, with a couple extra parameters to allow you to interact with your video collections. Namely we expose a collections list parameter, to specify which video collection to talk to and also allow some flags like force_search which gives the underlying LLM a hint that we need you to always execute a search for the incoming message as well as include_citations which tells the system to provide references for the information described by the chat generation.


my_collection = client.collections.create(
    name="My Must Eats",
    description="All my favorite food videos in one collection",
    collection_type="rich_transcripts",
    transcribe_config={
        "enable_summary":true,
        "enable_speech":true,
        "enable_scene_text":true,
        "enable_visual_scene_description":true,
    }
)

# Define your messages
query = "What do people like about Franklin's in Austin?"
messages = [
    {"role": "user", "content": query}
]

# Make an chat completion request
response = client.chat.completions.create(
    messages=messages,
    model="nimbus-001",
    collections=[my_collection.id],
    force_search=True,
    include_citations=True
)
print(response)

Overall

SDKs

Installation

Usage

Working with Files

Generating Transcripts

Extracting Structured Information from Videos

Prompt Only Extraction

Schema Driven Extraction

Working with Collections

Talking with Videos

Overall

SDKs

​Installation

​Usage

​Working with Files

​Generating Transcripts

​Extracting Structured Information from Videos

​Prompt Only Extraction

​Schema Driven Extraction

​Working with Collections

​Talking with Videos

Installation

Usage

Working with Files

Generating Transcripts

Extracting Structured Information from Videos

Prompt Only Extraction

Schema Driven Extraction

Working with Collections

Talking with Videos