This page describes how you can use Critique to assess whether the facts in one text are consistent with or supported by the facts in another text.
What is Factual Consistency?
Factual consistency refers to whether the facts contained in one text are consistent with or supported by the facts in another text. For example, consider the following document and summary:
The global population is estimated to reach 9.7 billion by 2050, according to the United Nations. The population growth is expected
to occur primarily in developing countries, where the majority of the world's population currently resides. In contrast, the population in developed
countries is projected to remain relatively stable. By 2050, it is estimated that nearly 70% of the world's population will live in urban areas.
The global population is estimated to reach 9.7 billion by 2025, with 70% living in urban areas.
The summary is not consistent with the facts in the previous document due to the the incorrect population estimate of the year 2050.
How to use the Critique API
The Critique API supports factual consistency evaluation and you can prepare your data in the following way:
dataset = [
{
"target": "The global population is estimated to reach 9.7 billion by 2050, according to the United Nations. The population growth is expected to occur primarily in developing countries, where the majority of the world's population currently resides. In contrast, the population in developed countries is projected to remain relatively stable. By 2050, it is estimated that nearly 70% of the world's population will live in urban areas.",
"source": "The global population is estimated to reach 9.7 billion by 2025, with 70% living in urban areas."
}
]
Then choose a suitable metric and config setting:
metric = "bart_score"
{
"variety": "source_to_target",
"model": "facebook/bart-large-cnn",
"language": "eng"
}
Finally, you can evaluate your dataset using the Critique API:
from inspiredco import critique
client = critique.Critique(api_key=os.environ["INSPIREDCO_API_KEY"])
result = client.evaluate(metric=metric, config=config, dataset=dataset)
Various Metrics/Configurations for Toxicity Evaluation
So far, Critique supports the following metrics for factual consistency evaluation: