the Detoxify metric.

Detoxify is a classification-based metric to detect toxic text of several varieties.

Configuration

Configuration for the parameters of the Detoxify metric:

title: "DetoxifyConfig"
type: "object"
properties:
  model:
    type: "string"
    pattern: "^[a-z0-9-/]+$"
    description: "Model name"
  language:
    type: "string"
    pattern: "^[a-z]{3}$"
    description: >
      "Three-letter abbreviation of the language in ISO 639-3 language code)"
      "format: https://en.wikipedia.org/wiki/ISO_639-3\n"
      "For example, English is 'eng'."

For the model parameter, you can use:

  • unitary/toxic-bert

Data

Accepted data format of the Detoxify metric. Note that there is a size limit of 2000 examples per query. If you want to submit more examples, you can use multiple queries.

title: "DetoxifyData"
type: "object"
properties:
  target:
    type: "string"
    description: "Input text to evaluate."
required:
  - "target"

Results

Format of the results of the Detoxify metric:

title: "DetoxifyResult"
type: "object"
$defs:
  DetoxifyStats:
    type: "object"
    properties:
      value:
        type: "number"
        description: "the confidence that the text is toxic"
      severe_toxic:
        type: "number"
        description: "the confidence that the text is severe toxic"
      obscene:
        type: "number"
        description: "the confidence that the text is obscene"
      threat:
        type: "number"
        description: "the confidence that the text is threat"
      insult:
        type: "number"
        description: "the confidence that the text is insult"
      identity_hate:
        type: "number"
        description: "the confidence that the text is identity hate"
properties:
  examples:
    type: "array"
    items:
      $ref: "#/$defs/DetoxifyStats"
required:
  - "examples"