the Detoxify metric.
Detoxify is a classification-based metric to detect toxic text of several varieties.
Configuration
Configuration for the parameters of the Detoxify metric:
title: "DetoxifyConfig"
type: "object"
properties:
model:
type: "string"
pattern: "^[a-z0-9-/]+$"
description: "Model name"
language:
type: "string"
pattern: "^[a-z]{3}$"
description: >
"Three-letter abbreviation of the language in ISO 639-3 language code)"
"format: https://en.wikipedia.org/wiki/ISO_639-3\n"
"For example, English is 'eng'."
For the model
parameter, you can use:
unitary/toxic-bert
Data
Accepted data format of the Detoxify metric. Note that there is a size limit of 2000 examples per query. If you want to submit more examples, you can use multiple queries.
title: "DetoxifyData"
type: "object"
properties:
target:
type: "string"
description: "Input text to evaluate."
required:
- "target"
Results
Format of the results of the Detoxify metric:
title: "DetoxifyResult"
type: "object"
$defs:
DetoxifyStats:
type: "object"
properties:
value:
type: "number"
description: "the confidence that the text is toxic"
severe_toxic:
type: "number"
description: "the confidence that the text is severe toxic"
obscene:
type: "number"
description: "the confidence that the text is obscene"
threat:
type: "number"
description: "the confidence that the text is threat"
insult:
type: "number"
description: "the confidence that the text is insult"
identity_hate:
type: "number"
description: "the confidence that the text is identity hate"
properties:
examples:
type: "array"
items:
$ref: "#/$defs/DetoxifyStats"
required:
- "examples"