Improving AI systems with Critique through selective annotation.

This page describes how you can use Critique to improve your AI systems through selective annotation.

Selective Annotation for Improving AI Systems

One of the easiest ways to improve an AI system that you are training yourself is by annotating more data. However, data annotation is expensive, so just annotating things randomly may not be the best use of your resources.

Instead, you can use Critique to identify outputs that are of lower quality, and then send these to human annotators to improve the system. This can either be done in a straightforward way, where you send a random set of flagged outputs to annotators, or you can use a more sophisticated approach, where you try to balance the amount of outputs across user subsets, domains, etc.

Code Example for Selective Annotation

Let’s take an example of if you were building a system ai_system to summarize some articles. You have a large collection of articles articles in your domain of interest, and you would be able to pay annotators to create good summaries of some of them. In the following code, we will ask the system to translate all the articles in the collection, use Critique to identify the ones that are of lower quality, and print them out as a list of articles to be annotated.

from inspiredco.critique import Critique
import os

# Create a Critique client
client = Critique(api_key=os.environ["INSPIREDCO_API_KEY"])

summaries = [ai_system.summarize(article) for article in articles]

# Run the Critique summarization evaluation model on the summaries
metric = "bart_score"
config = {
    "variety": "source_to_target",
    "model": "facebook/bart-large-cnn",
    "language": "eng"
dataset = [{"source": article, "target": summary} for article, summary in zip(articles, summaries)]
evaluation = client.evaluate(metric=metric, config=config, dataset=dataset)
for datapoint, evalpoint in zip(dataset, evaluation["examples"]):
    datapoint["evaluation"] = evalpoint["value"]

# Sort the summaries by quality, and print out only the top 100 for annotation
dataset.sort(key=lambda x: x["evaluation"])
for bad_datapoint in dataset[:100]:

Of course this can be done for other tasks using whatever assessment criterion you want to use, and you can also visit the getting started doc to see how to get an API key or find other details.