Assessing text toxicity with Critique.

This page describes how you can use Critique to assess whether text is toxic or harmful.

What is Toxicity in Text?

Toxicity refers to toxic or harmful content in natural language text. This can include content that is hateful, threatening, or otherwise offensive. Toxicity evaluation is an important problem as it has real-world applications in moderation, content filtering, and other areas. This is particularly important in the context of generative AI systems, as the output of generative AI-based systems are often not as well moderated as human-generated content.

How to use the Critique API

The Critique API supports toxicity evaluation and you can prepare your data in the following way:

dataset = [
    {
        "target": "This is a toxic comment",
    },
    {
        "target": "This is a non-toxic comment",
    }
]

Then choose a suitable metric and config setting:

metric = "detoxify"
config = {
    "model":"unitary/toxic-bert"
}

Finally, you can evaluate your dataset using the Critique API:

from inspiredco import critique

client = critique.Critique(api_key=os.environ["INSPIREDCO_API_KEY"])

result = client.evaluate(metric=metric, config=config, dataset=dataset)

Various Metrics/Configurations for Toxicity Evaluation

So far, Critique supports the following metrics for toxicity evaluation: