Getting started with the Critique API is an easy two-step process: acquire an API key and use it to make a request to the Critique server!
Note that Critique is currently in public beta, so if you encounter any issues please let us know by getting in contact through any of the buttons on the left!
Acquiring an API Key
To use Critique, you’ll need to acquire an API key. You can do this by signing up for a free account on the Inspired Cognition Dashboard.
Once you have an account, you can click to the Profile page and click the “Refresh” button to generate a new API key.
You’ll want to put this in a safe place (if you lose it you’ll need to
re-generate another key). You can also save it to an environment variable
like INSPIREDCO_API_KEY
for convenience.
export INSPIREDCO_API_KEY="your_api_key_here"
Finally, you’ll want to install the inspiredco
Python package, which
you can install through pip
:
pip install inspiredco
Preparing your Data
The first step is preparing the data that you want to send. Exactly how you prepare your request depends on the type of evaluation that you want to perform, but you will always need to prepare the name of the metric you want to use, the configuration for that metric, and the dataset you want to process.
Here is a very simple example of a request that uses the count
metric to
count the number of tokens in each sentence in the dataset. The configuration
specifies that the sentences should be tokenized (split into words) along
whitespace boundaries.
metric = "count"
config = {"tokenizer": {"name": "whitespace", "config": {}}}
dataset = [
{"target": "This is a test sentence."},
{"target": "This is yet another test sentence."},
]
For other examples, see the page for the Critique assessment criteria and choose the one that’s most appropriate for your use case.
Making a Critique Request
Once you have an API key and client library, you can make a request to the Critique server. The easiest way to make a call to the Critique server is to simply use the “evaluate” command. Using the dataset and configuration above, this is a complete script that counts the number of tokens in each example:
import os
from inspiredco.critique import Critique
client = Critique(api_key=os.environ["INSPIREDCO_API_KEY"])
metric = "count"
config = {"tokenizer": {"name": "whitespace", "config": {}}}
dataset = [
{"target": "This is a test sentence."},
{"target": "This is yet another test sentence."},
]
result = client.evaluate(metric=metric, config=config, dataset=dataset)
print(result)
This will send a request server and wait for the result. The result will be a Python Dictionary with several fields:
overall
: A dictionary for the overall score for the dataset in thevalue
field, and potentially other statistics.example
: A list of dictionaries, one for each example in the dataset, which will also contain a score invalue
and possibly other statistics.
Requesting a Job then Retrieving Results Later
The Critique client also has four other useful functions:
submit_task
: Submit a task to the Critique server and return the task ID.fetch_status
: Retrieve the status of a task.fetch_result
: Retrieve the result of a completed task.wait_for_result
: Wait for a task to complete and return the result.
These allow you to do things like submit a task and then check back later to see if it’s finished, which is particularly useful if you want to run evaluation tasks on larger datasets.
Here is a code example that evaluates the fluency of some text, which can take a bit longer than just counting the number of tokens in a sentence:
import os
from inspiredco.critique import Critique
client = Critique(api_key=os.environ["INSPIREDCO_API_KEY"])
metric = "uni_eval"
config = {"task": "summarization", "evaluation_aspect": "fluency"}
dataset = [
{"target": "This is a test sentence."},
{"target": "This is yet another test sentence."},
]
task_id = client.submit_task(metric=metric, config=config, dataset=dataset)
status = client.fetch_status(task_id)
print(status) # this should say the task is "queued"
# Do some other work while the task is running...
result = client.wait_for_result(task_id)
print(result)
status = client.fetch_status(task_id)
print(status) # this should say the task has "succeeded"
Making Several Parallel Requests
It’s also possible to make several requests in parallel, and they will all be processed at the same time.
Let’s say we have several metrics that we want to evaluate on the same dataset. The below example will do this for us:
import os
from inspiredco.critique import Critique
client = Critique(api_key=os.environ["INSPIREDCO_API_KEY"])
dataset = [
{"target": "This is a test sentence."},
{"target": "This is yet another test sentence."},
]
metrics_and_configs = [
(
"count",
{"tokenizer": {"name": "whitespace", "config": {}}}
),
(
"uni_eval",
{"task": "summarization", "evaluation_aspect": "fluency"}
)
]
# Submit all tasks
task_ids = [
client.submit_task(metric=metric, config=config, dataset=dataset)
for metric, config in metrics_and_configs
]
# Wait for all results
results = [
client.wait_for_result(task_id)
for task_id in task_ids
]
# Print the results
for result in results:
print(result)
You could do a very similar thing if you had multiple datasets that you wanted to evaluate at the same time by looping over datasets instead of looping over metrics and configurations.
Next Steps
Critique can do a lot more than count words and measure the naturalness of text, so once you’ve tried out the examples above, you can move over to our Critique Use Cases page to find some more use cases, or Critique Assessment Criteria to see what types of assessments you can perform with Critique.