Navigating the complexities of content moderation: four lessons learned from our recent red teaming exercise

Navigating the complexities of content moderation

Written by

Alexandru Voica

Published on

August 23, 2024

Table of contents

Text Link

Turn your texts, PPTs, PDFs or URLs to video - in minutes.

Learn more

Generative AI is a transformative technology that is helping people create synthetic media on par with images, audio or video produced with traditional means. However, without robust content moderation implemented at the point of creation, these powerful tools could become double-edged swords, leading to harmful content being created more easily than before and then distributed at scale.

Our approach to content moderation

When Synthesia started in 2017, its founders recognized that generative AI is a powerful technology that, when placed in the hands of people with bad intentions, will be misused. So they made three important decisions on day one.

First, we will never create a clone of someone without their consent. Non-consensual deepfakes are the biggest source of harmful synthetic media. Therefore, by introducing consent in the process of creating an AI avatar, we create an effective barrier against attempts of misuse for our platform.

Secondly, we will implement content moderation at the point of creation and define robust content policies that would limit the spread of misinformation, fraud, violent extremism, and other forms of abusive AI-generated content. By placing videos into our content moderation workflow before they’re created, we keep the platform safe for everyone to create videos that adhere to the highest possible standards of ethics and professionalism.

Finally, while we cannot claim to be perfect, we believe in the importance of collaboration and therefore we will work with our peers in the industry, policymakers and other organizations to learn from each other, share our efforts and lessons learned, and evolve our responsible AI practices to adjust to new threats. As a result, we’ve participated in the Deepfake Rapid Response Force established by WITNESS, collaborated with Partnership on AI on best practices for synthetic media, and contributed to the Content Authenticity Initiative set up by Adobe, Microsoft, Nvidia, Reuters and other media industry leaders. We also advocated for legislation in the UK, Europe and beyond to protect people from the dangers of non-consensual deepfakes such as the creation and distribution of sexually explicit videos and images or fraud.

Supporting the Responsible AI framework from RIL

More recently, we’ve signed up as supporters of the Responsible AI framework developed by Responsible Innovation Labs (RIL). The framework includes checklists, tools and how-to guides distilled from industry, civil society, and the public sector to create actionable commitments backed by real-world examples. The framework also recognizes the different needs and capabilities of startups, and therefore helps them evolve their approach to responsible innovation from the early stages of seed funding through to achieving product market fit and growing into a scaleup.

One of the recommendations included in the RIL’s framework is to audit and test products against misuse which is why Synthesia just completed a comprehensive red team exercise to test our content moderation capabilities. The results were both enlightening and encouraging, underscoring the critical need for oversight in the age of AI-generated content.

While we’ve run internal red team exercises in the past, this has been the first test where we worked with an external partner that specializes in threat intelligence and content moderation.

Red teaming our content moderation systems

Two experts from Risky Business Solutions LLC spent a month probing our AI systems for vulnerabilities. Using their combined 20 years of experience — one as an NSA analyst and the other in influence operations for the United States Air Force — as well as their work as investigators for companies like Meta, Gap, and StubHub, they attempted to generate harmful, biased, or misleading content using a variety of techniques, from subtle prompts to more overt attempts at circumventing our safeguards.

Additionally, they produced a threat intelligence report that analyzed the dark web, mainstream and niche social media platforms, cybercrime forums, and search engines to detect how bad faith actors are thinking about misusing generative AI. This report produced valuable insights for our security team, including the most common types of attacks and harmful content that bad faith actors engage in with generative AI such as financially-motivated scams, automated clickbait and spam campaigns.

Here are four lessons we’ve learned from their red team test, which wrapped up last month.

Nuanced challenges: While there are certain content categories such as terrorism, graphic content and child endangerment that everyone can agree are harmful, Risky Business Solutions tried to test whether our systems can effectively catch less obvious attempts to generate abusive content. For instance, the team generated videos which contained attempts to influence their political views with polarizing news-like content, or convince them to invest in dubious cryptocurrencies.

Thankfully, our content moderation systems caught every attempt to generate such content and the videos were never created.

Tackling worst offenders: Risky Business Solutions observed instances where other generative AI platforms incentivize bad behavior by not banning accounts for extreme violations and even refunding bad faith actors their money when they would attempt to create harmful content.

In contrast, Synthesia banned accounts when they attempted to create bullying and harassment-type content and never refunded the money when harmful content was found to break our policies. By introducing these strict rules, we create a powerful financial deterrent that makes bad faith actors avoid using our platform because they’d start experiencing significant monetary losses.

Scale and speed: The sheer volume and speed of content generation possible with AI presents unique challenges. While other companies employ no content moderation or just use a basic keywords-based approach, Synthesia uses a combination of sophisticated automated systems and an in-house moderation team that can withstand high-volume attempts.

However, despite our best efforts, our systems are never going to be perfect. There was one instance where slight changes to a COVID-19 misinformation test video evaded our detection. We investigated the particular case and found the video passed our human review process because of an error made by a content moderator, even though our automated systems flagged the video to be in breach of our content policies. To ensure this was a one-off enforcement error, we asked the Risky Business Solutions team to stress test this area further and no additional harmful videos slipped through.

While attempting to generate dozens of variations of a violating video would be significantly costly and time consuming on Synthesia, it does highlight the importance of creating more sophisticated defense mechanisms that can cope with DDoS-type attempts which could overwhelm the content moderation system.

Separating genuine mistakes from harmful intentions: Our systems perform well in detecting overt attempts to create harmful content, but they also need to account for situations when people make genuine mistakes, particularly for more complex or nuanced videos. For example, is someone making a video educating their audience about blockchain technology or are they trying to engage in a pump-and-dump crypto scheme? Are they educating people about a particular medical condition or attempting to sell miracle drugs?

In two instances, the report found that the way in which we provided the justification for why a video could not be generated could be confusing to people who are not well versed in our content policies and simply made a mistake. We refused to generate a video which included current affairs-like content, citing our misinformation policy. Even though the content of the video was factually accurate, our misinformation policy contains a clause stating that non-news organizations cannot use stock avatars on our platform for media reporting. Furthermore, we ban the creation of opinion-type political content with stock avatars. We’ve made this decision to prevent the generation of mis- and disinformation but it’s a policy based on a combination of behaviors and content which should’ve been more clearly explained in the rejection message.

The results from this test are a direct result of our investments in content moderation over the years as well as recent initiatives to build up our defenses, including:

Scalable infrastructure and updated policies: We've invested in more powerful technology to handle high-volume content moderation without compromising on accuracy or speed. In response to findings from Graphika and reported by the New York Times, we’ve also updated our content policies to prevent the creation of political content with stock avatars or news-like content from non-enterprise accounts.
Treating safety as a product, not just as an operational concept: We've dedicated 10% of the company’s headcount to trust and safety work so we can handle more complex cases and continually improve our systems. Rather than assigning this work to just one operational team which sits isolated from product development, we’ve extended AI safety and security responsibilities to our product and engineering teams to ensure that the features they’re building are designed with safety in mind.
Transparency and accountability: We're committed to regular public audits of our systems and outcomes and to adopting the highest international standards for AI safety. For example, we’ve started the process of ISO-42001 certification which should complete over the next few months. ISO 42001 is an international standard that is designed to ensure the responsible development and use of AI systems within organizations.
Providing a channel for user feedback: We’ve recently introduced an appeals process through which users can ask for a second review of our content moderation decisions to account for situations where we’ve made a genuine mistake. We also give people the ability to report any concerns they might have about our use of AI, including misuse of our platform or content that violates our policies.

As AI-generated content becomes more prevalent, companies developing or using generative AI platforms need to think more seriously about trust and safety. Those that don’t will not only have to deal with the legal issues that result from their inaction but will also risk damages to their brand and put the integrity of the information ecosystem at risk.

Synthesia's founders have always seen AI safety as a core part of building a responsible and reliable platform for video communications and knowledge sharing. Treating AI governance as integral to our product development process gives our customers confidence they can leverage our cutting-edge AI capabilities while upholding their ethical and legal obligations.

We’ve been transparent about the capabilities and limitations of our platform so by sharing the results of this red teaming effort, we hope to create an open environment where industry peers, policymakers, and the public can engage with us in these crucial conversations.

Together, we can harness the power of AI while safeguarding the values that make our digital spaces creative and trustworthy.

About the author

Head of Corporate Affairs and Policy

Alexandru Voica

Alexandru Voica is the Head of Corporate Affairs and Policy at Synthesia.

Go to author's profile

View all posts

No items found.

How to guides

AI for Sales Prospecting: Personalized Video That Converts at Scale

Scale sales outreach with AI video. Learn how personalized video boosts response rates and helps teams stand out in crowded inboxes.

How to guides

Talent Sourcing Strategies: How Personalized Video Can Improve Candidate Response Rates

Boost candidate response rates with personalized video. A smarter way to scale your recruitment sourcing and stand out in crowded inboxes.

Synthesia News

Synthesia surpasses $100 million in annual recurring revenue and secures strategic investment from Adobe Ventures

We have reached over $100 million in annual recurring revenue (ARR) and received a strategic investment from Adobe Ventures.

faq

Frequently asked questions

Ready to try our AI video platform?

Join over 1M+ users today and start making AI videos with 230+ avatars in 140+ languages.