Languages: | English | 日本語 |
Support for Critique will be discontinued at the end of August 2023. Please contact us if you have any questions.
Advances in generative AI technology have made it easier than ever to deploy amazing apps. You can translate between languages, quickly get the gist of an article, or generate images of anything you want. However this technology is still not perfect and it’s hard to predict when and how it will fail when you put your apps into production.
Enter Critique! Critique is a state of the art quality monitoring tool for generative AI that makes it simple to detect potential failures and remedy them. How simple? Below is a simple example of how to use the Critique client to detect whether a text is written in fluent and natural English.
import os
from inspiredco.critique import Critique
client = Critique(api_key=os.environ["INSPIREDCO_API_KEY"])
dataset = [
{"target": "This is a really nice test sentence."},
{"target": "This sentence not so good."},
]
results = client.evaluate(
metric="uni_eval",
config={"task": "summarization", "evaluation_aspect": "fluency"},
dataset=dataset,
)
for datapoint, result in zip(dataset, results["examples"]):
print(f"Text: {datapoint['target']}, Fluency: {result['value']}")
The above code will identify the first sentence as being fluent and natural, and the second sentence as not:
Text: This is a really nice test sentence., Fluency: 0.939887774198021
Text: This sentence not so good., Fluency: 0.3681649020330465
To try it yourself, you can download the client, or try it out from your web browser using the playground on the Inspired Cognition Dashboard.
So what are some of the things that you can do with Critique?
- Output Filtering: Warn your users when the generation might be bad, or avoid showing it to them at all.
- System Comparison: Compare the performance of multiple systems to decide which one to use.
- System Monitoring: Monitor outputs of your production system to make sure that there are no quality regressions or underperforming user segments.
- Selective Annotation: Send flagged outputs to annotators to improve your system performance.
Critique can assess generative AI systems using many different criteria. To just give a few examples, it can assess:
- Translation Quality: Detect the quality of translations from one language to another.
- Summarization Quality: Detect how well one text summarizes another.
- Toxicity: Detect whether a text is toxic or offensive.
- Fluency: Detect whether a text is natural and fluent.
- Factual Consistency: Detect whether a text is consistent with the facts stated in another text.
If this has piqued your interest, check out our getting started tutorial, example use cases, and supported assessment criteria, and supported metrics.
If you have any questions, get in contact with us via our support chat, discord channel, or email via the buttons on the left, and we’ll be happy to help out!