The BLEU metric.
BLEU is a classic metric for evaluating the similarity between a target and reference text based on similarity of n-grams.
Configuration
Configuration for the parameters of the BLEU metric:
title: "BleuConfig"
type: "object"
$defs:
TokenizerSpec:
type: "object"
properties:
name:
type: "string"
description: "Name of the tokenizer to be used."
config:
type: "object"
description: >
"Any additional configuration that tokenizer needs. "
"This will be in the form specified by the JSONSchema for that tokenizer."
required:
- "name"
- "config"
properties:
max_ngram_order:
type: "integer"
description: "The number of n-grams to count."
smooth_method:
type: "string"
description: "The method used to smooth counts before calculating BLEU."
enum:
- "none"
- "add_k"
- "floor"
- "exp"
smooth_value:
type: "number"
description: "The value to use for smoothing."
output_spans:
type: "boolean"
description: >
"Whether to output information about the quality of individual character "
"spans within each target text."
tokenizer:
$ref: "#/$defs/TokenizerSpec"
Data
Accepted data format of the BLEU metric:
title: "BleuData"
type: "object"
properties:
target:
type: "string"
description: "Target text to evaluate."
references:
type: "array"
description: "The references to evaluate the target against."
items:
type: "string"
required:
- "target"
- "references"
Results
Format of the results of the BLEU metric:
title: "BleuResult"
type: "object"
$defs:
BleuSpan:
type: "object"
properties:
value:
type: "number"
description: >
"A value corresponding to the maximum n-gram length matched by this token, "
"divided by the maximum n-gram length overall."
start:
type: "number"
description: "The span start position (in characters, inclusive)."
end:
type: "number"
description: "The span end position (in characters, exclusive)."
required:
- "value"
- "start"
- "end"
BleuStats:
type: "object"
properties:
value:
type: "number"
description: "BLEU score for the entire dataset."
correct:
type: "array"
description: >
"List of counts of correct ngrams, 1 <= n <= max_ngram_order. "
"When smoothing is used, these will be the adjusted counts after smoothing."
items:
type: "number"
total:
type: "array"
description: >
"List of counts of total ngrams, 1 <= n <= max_ngram_order. "
"When smoothing is used, these will be the adjusted counts after smoothing."
items:
type: "number"
precisions:
type: "array"
description: "List of precisions, 1 <= n <= max_ngram_order"
items:
type: "number"
brevity_penalty:
type: "number"
description: "The brevity penalty."
sys_len:
type: "number"
description: "The cumulative system length."
ref_len:
type: "number"
description: "The cumulative reference length."
spans:
type: "array"
description: "A list of spans within an example and matching scores."
items:
$ref: "#/$defs/BleuSpan"
required:
- "value"
properties:
overall:
$ref: "#/$defs/BleuStats"
examples:
type: "array"
items:
$ref: "#/$defs/BleuStats"
required:
- "overall"
- "examples"