Expressive Avatars powered by Synthesia’s new EXPRESS-1 model are here

Written by
Jon Starck
Published on
December 2, 2024
Table of contents

Turn your texts, PPTs, PDFs or URLs to video - in minutes.

Learn more

Welcome to a new era of video communications and knowledge sharing at work

While tools like email, messaging, and file sharing have been invaluable in today’s fast-paced, distributed world of work, they often fall short in delivering the deep understanding, creative spark, and personal connections that are so vital to driving innovation and impact. 

That's why video is emerging as a powerful new medium in the enterprise space, transforming how workplaces communicate and collaborate. By adding engaging visuals, vocal intonation or body language to our digital interactions, video bridges the gaps that text alone cannot.

However, until recently, only a few people in a company had the ability or skills to produce and distribute video at scale. So in 2017, we set out on a mission to change that by building a platform that allows companies to fully embrace video for business communications and knowledge sharing, and transform any employee into a video creator in the process. 

At the center of Synthesia’s platform are our AI avatars: dynamic and lifelike digital personas that blend the best of human and artificial intelligence into one seamless experience. Over 200,000 people have used our 225 avatars to create more than 18 million video presentations and published them in over 130 languages. 

Introducing Expressive Avatars

Today, we are excited to unveil the fourth generation of Synthesia’s AI avatars which we’re calling Expressive Avatars. Expressive Avatars are powered by our EXPRESS-1 model for realistic avatar performance. By training a model to understand the intricate relationship between what we say and how we say it, Expressive Avatars can now perform their script with the correct tone of voice, body language and lip sync, like a real actor would.

‎With these new avatars, we’re not just creating digital renders; we’re introducing digital actors. This technology brings a level of sophistication and realism to digital avatars that blurs the line between the virtual and the real.

Expressive Avatars are the world’s first avatars fully generated with AI. More specifically, our EXPRESS-1 model uses large, pre-trained models as a backbone to drive the performance for Expressive Avatars, combined with diffusion to model complex multimodal distributions.

EXPRESS-1 predicts every movement and facial expression in real-time, aligning seamlessly with the timings, intonations, and emphasis of spoken language. This results in performances that are astonishingly naturalistic and human-like, setting a new standard in the industry.

Whereas other solutions available from our competitors are limited by pre-recorded and pre-defined dynamics, Synthesia’s fourth generation avatars are capable of generating entirely new and unique performances, greatly expanding the range of expression and interaction.

‎This leap forward allows for a performance space that was previously unimaginable, offering human-like realism in terms of their delivery. 

You can experience the true power of our Expressive Avatars on our website at www.synthesia.io/avatars. The text acts as the script which our avatars use to deliver a variety of performances.

If one rendition doesn’t quite capture the desired emotion or emphasis, users can simply regenerate for a different take. This feature empowers users to achieve the perfect expression for their needs, making each interaction unique and tailored.

How Expressive Avatars work

That’s because Expressive Avatars don’t just mimic human speech; they understand its context using our custom built EXPRESS-1 model. Whether the conversation is cheerful or somber, our avatars adjust their performance accordingly, displaying a level of empathy and understanding that was once the sole domain of human actors.

‎The generative capabilities of these new avatars also extend beyond mere motion. Their facial expressions, blinking, and even eye gaze are now perfectly attuned to their speech. Expressive Avatars synchronize flawlessly with audio inputs, ensuring that every gesture and expression aligns perfectly with the spoken word. This harmony of motion and sound elevates the realism of our avatars and captures every nuance of human expression, bringing our avatars to life like never before. 

Safety built-in from day one

Synthesia has been focused on developing AI responsibly from day one. We are aware that Expressive Avatars are a powerful new technology, released during an important year for democracy, when billions of people around the world exercise their right to vote. 

We’ve also seen how, in the hands of companies who don’t think about trust and safety, AI can be misused to interfere with our civic processes or cause individual and societal harm. 

Therefore, we’ve taken additional steps to prevent the misuse of our platform, including updating our policies to restrict the type of content people can make, investing in the early detection of bad faith actors, increasing the teams that work on AI safety, and experimenting with content credentials technologies such as C2PA - you can read more about these efforts on our blog

The future of communication and collaboration in the workplace

Expressive Avatars are the first step towards the next frontier of digital communication and knowledge sharing. Imagine for example the typical onboarding experience for a new hire: today, it’s a static video which is still an improvement over a lengthy document.

But soon you’ll be able to recreate the entire workplace experience inside our platform, with AI avatars that can move around in three dimensional spaces and communicate with you or other avatars, showing you around the workspace, helping you find a meeting room, teaching you about your company and its products, or introducing you to your colleagues before you’ve met them in real life. 

Amazon Web Services (AWS) has been one of the first companies to get early access to this technology.

Tanuja Randery, EMEA MD, Amazon Web Services said: “I experienced first-hand having my own avatar created for my keynote at AWS Summit London, and saw the potential for this technology to deliver engaging business communications in many different languages and scenarios that simply wouldn’t be possible otherwise. We’re delighted that Synthesia has selected AWS for scalable, flexible, and secure cloud compute to build and train their large language models which generate Expressive Avatars.”

Here’s what our partners have to say about Expressive Avatars: 

Philippe Botteri, Partner at Accel said: "From the moment we met Victor and Steffen, Synthesia stood out as one of the few generative AI companies that brought together an exceptional founding team, differentiated technology and a clear ROI for enterprise customers. Victor, Steffen and the Synthesia team’s ability to push the boundaries of what’s possible on the research front while translating those breakthroughs into customer-facing improvements in the product has always shone through and today’s launch of Expressive Avatars is another fantastic example of this. The team has really set a new standard for the rest of the industry!"


“Synthesia's new Expressive Avatars are truly a breakthrough in generative AI," said Josh Coyne, Partner at Kleiner Perkins. "We've been strong supporters of Synthesia and its co-founders since the early days of the company, and this new technology further reinforces our belief in the immense potential of AI avatars and synthetic media to revolutionize the landscape of enterprise communications and knowledge sharing.”


"The launch of Expressive Avatars is a game changer for workplace communication and gives enterprises the power to create dynamic videos that shape the employee experience," said Vidu Shanmugarajah, Partner, GV (Google Ventures). "As repeat investors since 2021, we continue to be impressed by Synthesia's ability to use AI to unlock real value for the world’s leading companies."

We can’t wait to see what people create using Expressive Avatars. As Gen Z and Gen Alpha enter the workforce, video will become the default medium for collaboration and communication at work and our AI avatars will reshape how employees connect, interact and learn, creating a more dynamic and knowledgeable workforce fit for tomorrow.

About the author

CTO

Jon Starck

Jonathan Starck is CTO at Synthesia, a startup founded in 2017 and now a generative AI unicorn.

Go to author's profile
faq

Frequently asked questions