Mistral launches a moderation API



AI startup Mistral has launched a new API for content moderation.

The API, which is the same API that powers moderation in Mistral’s Le Chat chatbot platform, can be tailored to specific applications and safety standards, Mistral says. It’s powered by a fine-tuned model (Ministral 8B) trained to classify text in a range of languages, including English, French, and German, into one of nine categories: sexual, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health, financial, law, and personally identifiable information.

The moderation API can be applied to either raw or conversational text, Mistral says.

“Over the past few months, we’ve seen growing enthusiasm across the industry and research community for new AI-based moderation systems, which can help make moderation more scalable and robust across applications,” Mistral wrote in a blog post. “Our content moderation classifier leverages the most relevant policy categories for effective guardrails and introduces a pragmatic approach to model safety by addressing model-generated harms such as unqualified advice and PII.”

AI-powered moderation systems are useful in theory. But they’re also susceptible to the same biases and technical flaws that plague other AI systems.

For example, some models trained to detect toxicity see phrases in African-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as disproportionately “toxic.” Posts on social media about people with disabilities are also often flagged as more negative or toxic by commonly used public sentiment and toxicity detection models, studies have found.

Mistral claims that its moderation model is highly accurate — but also admits it’s a work in progress.

“We’re working with our customers to build and share scalable, lightweight, and customizable moderation tooling,” the company said, “and will continue to engage with the research community to contribute safety advancements to the broader field.”




Source