The ROUGE metric.
ROUGE is a metric for evaluating the quality of summaries based on either unigram (rouge_1
) or bigram (rouge_2
) matches, or the longest common subsequence (rouge_l
).
Configuration
Configuration for the parameters of the ROUGE metric:
title: "RougeConfig"
type: "object"
$defs:
TokenizerSpec:
type: "object"
properties:
name:
type: "string"
description: "Name of the tokenizer to be used."
config:
type: "object"
description: >
"Any additional configuration that tokenizer needs. "
"This will be in the form specified by the JSONSchema for that tokenizer."
required:
- "name"
- "config"
properties:
variety:
type: "string"
description: "Name of the metric variety to be used."
enum:
- "rouge_1"
- "rouge_2"
- "rouge_l"
use_stemmer:
type: "boolean"
description: "whether to use stemmer"
multi_ref_agg:
type: "string"
description: "Whether to mean or max aggregation when multiple references are given"
enum:
- "max"
- "mean"
- "min"
tokenizer:
$ref: "#/$defs/TokenizerSpec"
required:
- "variety"
Data
Accepted data format of the ROUGE metric:
title: "RougeData"
type: "object"
properties:
target:
type: "string"
description: "Input text to evaluate."
references:
type: "array"
description: "Gold reference texts"
items:
type: "string"
required:
- "target"
- "references"
Results
Format of the results of the ROUGE metric:
title: "RougeResult"
type: "object"
$defs:
OverallResult:
type: "object"
properties:
value:
type: "number"
description: "Average ROUGE score of all target texts."
required:
- "value"
ExampleResult:
type: "object"
properties:
value:
type: "number"
description: "Example-level ROUGE Score"
required:
- "value"
properties:
overall:
$ref: "#/$defs/OverallResult"
examples:
type: "array"
items:
$ref: "#/$defs/ExampleResult"
required:
- "overall"
- "examples"