Turn your texts, PPTs, PDFs or URLs to video - in minutes.
In today’s digital landscape, video is one of the most powerful communication tools, essential for education, marketing, and customer support. But as video libraries grow, finding the right content quickly becomes a challenge. At Synthesia, we’re tackling video content search by building robust search capabilities that allow users to navigate vast video libraries effortlessly.
Have you ever come across a situation where you need to find back a video you’ve seen only remembering some pieces of content without knowing the title or location? A deep search including the actual transcript of the video is critical.
In this blog post, let’s first dive on the challenges faced by providing such search capabilities. Then we can quickly go over the business impact of having a powerful video search solution.
The Search Problem: Why Video Discovery is Hard
Searching through a library of videos is not as simple as a basic file search. Unlike text-based documents, videos rely heavily on metadata—titles, descriptions, and tags—to be organized and found. This leaves a lot of room for inconsistency, human error, and missing information.
Consider the following challenges:
1. Unstructured metadata: Users might label videos with inconsistent titles and descriptions, making it difficult to search for specific content.
2. Duplicate efforts: Without a robust search, users may inadvertently recreate videos that already exist.
3. Missed opportunities: Key content could be overlooked, especially if relevant videos are buried deep within folders.
Currently, Synthesia offers tools like workspaces, folders, and custom titles/descriptions to organize video content. While these tools are useful, they require manual input and consistency, which doesn’t always scale well in larger environments.
In addition, having a unified search experience provides content where the users are at. It could be from a search bar on a website, a private note in a ticket system, an app in Slack or Teams or more. Having a search option available in various platforms reduces time spent locating videos, enhances workflow integration and ensures wider adoption.
The Cost of Inefficient Video Search
Without a robust video search system, organizations risk inefficiencies in several areas:
1. Missed Information: Important videos may go unnoticed, leading to missed opportunities for education, marketing, or customer service.
2. Duplicated Efforts: Users may recreate videos unnecessarily, wasting time and resources.
3. Increased Frustration: Users may become frustrated if they can't find the videos they need, impacting overall satisfaction.
Going Beyond Titles: Our Advanced Content-Based Search Solution
To solve these challenges, our Strategic Implementation Specialist Team team is hard at work on a search system. This system leverages advanced natural language processing (NLP) models and a vector-based search mechanism to index and query video content with precision. By using advanced embeddings, our system helps users locate videos based on context, capturing the intent behind their queries for quicker, more relevant results.
This improves the search result quality since it understands the actual meaning of the video presentation through its transcript and allows users to search through the video context and not only its title and description.
We are focused on three key areas:
1. Search Query to Embedding: The system uses OpenAI's embedding models to convert search queries into embeddings—mathematical representations that capture the semantic meaning of the query. This allows us to go beyond simple keyword matching.
2. Querying the Vector Database: Once the query has been converted to an embedding, it's used to search a vector database (MongoDB Atlas). This ensures that search results are based on the semantic content of the videos, not just their titles or descriptions.
3. Storing Search Queries: Every search is stored in our system, along with the results and their relevance scores. This allows us to continuously improve the search algorithm and provide insights into the types of searches users are conducting.
By combining these three elements, our search system will allow users to find the most relevant videos based on their intent, not just the keywords they type.
Technical details
Let's examine the architecture of our current solution and its key workflows.
The foundation of this solution is a vector database designed to index video content, with each segment represented by its unique embedding.
To populate the database, we leverage Synthesia's API to retrieve subtitles through SRT files. To ensure meaningful segmentation, we first process these SRT files with an LLM, converting them into structured markdown. This allows us to apply LangChain's native text splitter, breaking the content into manageable and meaningful pieces. Each segment is then transformed in embedding using our LLM, and stored within the database for easy retrieval.
On the search side, the process is more straightforward. We use embedding to transform search queries into meaningful data points. Embedding both the transcripts and queries transforms them into semantic data points that enable a highly accurate, context-driven search experience. We can then use a vector search to locate the most relevant video segments, complete with relevancy scores. Initially, while we demonstrate a frontend example in React, our API endpoint remains open for flexible integration by users.
Conclusion
At Synthesia, we’re committed to helping users find the content they need quickly and easily. Our global search system is a key part of this mission, and we’re excited about the possibilities it opens up for our customers. Whether it's improving education, customer support, or marketing, we believe that enhanced video search will play a critical role in the future of digital communication. Note that we only covered a small part of the technical implementation. In following blog posts, we will explore multiple integration options that make this search engine even more powerful.
Stay tuned as we continue to develop and refine this system, and don’t hesitate to reach out if you’d like to learn more about how Synthesia can help your organization.
We’d love to hear your thoughts! Share your video search challenges, suggestions, and insights in the comments below, or feel free to reach out to us directly—your feedback can shape our future developments.
About the author
Nicolas Narbais
Nicolas is a solution architect. He has worked across various industries, specializing in SaaS platforms and cloud-based architectures, with expertise in technologies like Datadog, AWS, serverless computing, and data infrastructure.
Frequently asked questions
Why is video content search so challenging compared to text-based search?
Unlike text documents, videos rely on metadata such as titles, descriptions, and tags for discoverability. This metadata is often inconsistent or incomplete, making it difficult to locate specific content. Additionally, videos lack the inherent structure of text, requiring advanced tools like transcripts and semantic search to make content searchable effectively.
How does Synthesia's advanced search system improve video discoverability?
Synthesia leverages natural language processing (NLP) and vector-based search to index and retrieve video content based on context, not just keywords. By using embeddings to understand the intent behind queries, users can find the most relevant videos quickly and accurately, even when metadata is insufficient.
What types of integrations are supported for Synthesia’s search solution?
Synthesia’s search system is designed for seamless integration into various platforms, including website search bars, ticketing systems, and communication tools like Slack and Teams. This flexibility ensures that users can locate video content directly within their existing workflows.
How does Synthesia handle search queries to continuously improve results?
Every search query and its corresponding results are logged with relevance scores. This data is used to refine the search algorithm over time, ensuring better accuracy and a more intuitive experience for users as the system learns from their behaviors.