The BERTScore metric.

BERTScore is a metric for measuring generated text that measures the similarity between embeddings calculated by the BERT model.

Configuration

Configuration for the parameters of the BERTScore metric:

title: "BertScoreConfig"
type: "object"
properties:
  variety:
    type: "string"
    description: "What variety of score to calculate (e.g. f_measure)"
    enum:
      - "f_measure"
      - "precision"
      - "recall"
  language:
    type: "string"
    pattern: "^[a-z]{3}$"
    description: >
      "Three-letter abbreviation of the language in ISO 639-3 language code)"
      "format: https://en.wikipedia.org/wiki/ISO_639-3\n"
      "For example, English is 'eng'."
  model:
    type: "string"
    pattern: "^[a-z0-9-/]+$"
    description: "Model name."
  num_layers:
    type: "integer"
    minimum: 1
    description: >
      "Use the Nth layer in the model (e.g. 8). "
      "Must be between 1 and the number of layers in the model."
  all_layers:
    type: "boolean"
    description: "Use all layers, not just the selected one."

For the model parameter, you can use the following:

  • bert-base-uncased

More models will be coming soon! Please get in contact if you’re interested in using a different model.

Data

Accepted data format of the BERTScore metric. Note that there is a size limit of 2000 examples per query. If you want to submit more examples, you can use multiple queries.

title: "BertScoreData"
type: "object"
properties:
  target:
    type: "string"
    description: "Target text to evaluate."
  references:
    type: "array"
    description: "The references to evaluate the target against."
    items:
      type: "string"
required:
  - "target"
  - "references"

Results

Format of the results of the BERTScore metric:

title: "BertScoreResult"
type: "object"
$defs:
  BertScoreStats:
    type: "object"
    properties:
      value:
        type: "number"
        description: "The main BERTScore value."
      precision:
        type: "number"
        description: "Precision score."
      recall:
        type: "number"
        description: "Recall score."
      f_measure:
        type: "number"
        description: "F-measure score."
    required:
      - "value"
properties:
  overall:
    $ref: "#/$defs/BertScoreStats"
  examples:
    type: "array"
    items:
      $ref: "#/$defs/BertScoreStats"
required:
  - "overall"
  - "examples"