Uncovering a German Toxic Language Bias in Google’s AI Tool
A team of researchers led by the University of Applied Sciences and Arts of Southern Switzerland, supported by USC Viterbi’s Information Sciences Institute (ISI), has revealed a striking language bias in Google’s Perspective API, a machine learning tool widely used to assess the toxicity of social media messages. The study, presented at the 2025 International AAAI Conference on Web and Social Media (ICWSM), found that the tool consistently rates German-language texts as significantly more toxic than identical content in English, raising concerns for both academic research and real-world content moderation.
“These tools are widely used for academic research and automatic content moderation by different organizations, but their internal workings are opaque,” said study first author Gianluca Nogara, a scientific collaborator at University of Applied Sciences and Arts of Southern Switzerland. “We often lack access to the training data, model architecture, or detailed implementation details, which makes it difficult to assess the reliability and fairness of their outputs.”
Perspective API is used by platforms such as Reddit and The New York Times, and has over 1,400 mentions on Google Scholar as of January 2024. It works by providing a toxicity score—ranging from 0 to 1—that signals whether a message contains rude, disrespectful, or unreasonable language.
Flagged four times more
The team analyzed millions of tweets and thousands of Wikipedia summaries across multiple languages, demonstrating that toxicity scores for German-language texts are consistently higher than those for other languages. In one experiment, the same tweets were translated from German to English and re-evaluated using the API. On average, the German versions with identical meanings scored 8.5% higher in toxicity. “This would lead to a major imbalance in user moderation and a potential bias in research, considering the importance of the API,” Nogara said.
Ultimately, the study found that using Perspective API’s default moderation threshold could result in German-language users being flagged or censored four times more often than English speakers for identical content. “The language bias could lead to potential discrimination in removing content or suspending users, for the wrong reason,” Nogara said. “This is unfair and would infringe users’ freedom of expression for a wrong linguistic reason.”
The reason still unknown
To explore the mechanisms behind the bias, the team examined the impact of specific linguistic features. For instance, the researchers tested whether certain German terms—such as “die,” which means “the” in German but has a quite different meaning in English—might cause the high scores. However, experiments showed that no specific terms increased toxicity.
Additionally, researchers checked if the bias could stem from special characters, since German includes extended Latin letters like ä, ö, and ü. To test this, they compared the results to texts in Arabic, Chinese, and Japanese, languages with entirely different non-Latin scripts. But these texts did not show the same spike in toxicity scores, ruling out character set as a cause.
Going forward, the team plans to expand their work by analyzing text from more diverse social media platforms and sources beyond X/Twitter and examining other Perspective API attributes like “Insult” and “Profanity.” They are also considering extending the study to test whether similar German-language discrimination occurs within other LLM-based models for toxicity detection.
ISI Research Assistant Professor Luca Luceri contributed to the design of the research from its conception, conducting preliminary investigations into toxicity in multilingual online discussions, to its development, supporting the creation of the analytical framework used to evaluate biases in toxicity detection models.
Published on August 19th, 2025
Last updated on August 19th, 2025