This page lists some of the most popular API-based AI systems for various popular tasks. There are lots of great solutions out there, but sometimes it can be hard to pick the one that is most appropriate for your use case. If you’d like some advice on which one best fits your use case, reach out to us through the form below and we’d be happy to help out!
Also, if you notice that we’re missing a popular API-based AI system, or any of the information below is out-of-date, please get in touch at any time!
Image/Video Recognition
These APIs perform image or video recognition, i.e. they take an image or video as input and return a label or set of labels that describe the image or video. There are a number of tasks that they tend to focus on, including:
- Standard object detection: detecting a standard set of objects, such as people, cars, **and animals.
- Custom object detection: detecting objects that are specified by the users.
- Face detection: detecting faces in images.
- Face recognition: comparing two faces to see if they are the same person.
- Content moderation: detecting inappropriate content, such as nudity or violence.
- Video analysis: specifically detecting characteristics of video.
Service | Standard objects | Custom objects | Face detection | Face recognition | Content moderation | Video analysis | Notes |
---|---|---|---|---|---|---|---|
Amazon Rekognition | X | X | X | X | X | X | |
Imagga | X | X | X | X | X | Supports on-premise deployment | |
Clarifai | X | X | X | X | |||
Google Cloud Vision | X | X | |||||
Microsoft Azure Computer Vision | X | X | X | X | Extensive certification for government work |
Natural Language Understanding
These APIs cover tasks that involve understanding natural language such as English. They can be used to perform tasks such as:
- Text classification: classifying a piece of text into a set of custom-specified categories.
- Sentiment analysis: a popular variety of text classification determining whether a piece of text is positive, negative, or neutral.
- Entity recognition: identifying entities in a piece of text, such as people, places, and organizations.
- Event recognition: identifying events in a piece of text, or “who did what to whom”.
- Syntax analysis: identifying the grammatical structure of a piece of text, such as the parts of speech and the syntactic dependencies between words.
- Search: search for text matching a query.
- Summarization: summarize a longer text into a shorter text by extracting content or generating a summary from scratch.
There are a number of options for each of these.
Service | Languages covered | Text class. | Sentiment | Entities | Events | Syntax | Search | Summarization | Notes |
---|---|---|---|---|---|---|---|---|---|
Amazon Comprehend | 12 languages | X | X | X | X | X | X | ||
Clarifai | English | X | X | X | |||||
Google Cloud Natural Language | 12 languages | X | X | X | X | Has [healthcare-specific API](https://cloud.google.com/natural-language/healthcare/ | |||
IBM Watson Natural Language Understanding | 23 languages | X | X | X | X | X | X | Supports on-premise deployment. | |
Microsoft Azure Text Analytics | 1-115 languages (task dependent) | X | X | X | X | Has medical text models. | |||
Aylien | 5 languages | X | X | X | Focuses on news processing. |
Speech Processing
Speech recognition, or speech-to-text, converts spoken words into textual transcripts. Conversely, speech synthesis, or text-to-speech, converts text into spoken words.
- Google Cloud Speech-to-Text: Supports 68 languages and many dialects. Supports on-premise deployment.
- Microsoft Azure Speech Services: Broad multi-lingual and multi-dialect support. Supports on-premise deployment.
- Amazon Transcribe: Supports 23 languages.
- Rev.ai: Supports 36 languages.
Machine Translation
Machine translation is the task of translating text from one language to another.
- Google Cloud Translation: Supports 135 languages. Can train domain-specific models, and supports real-time audio translation.
- DeepL: Supports 28 languages.
- Microsoft Azure Translator: Supports 100+ languages. Also supports real-time speech translation.
- Amazon Translate: Supports 75 languages.
- Aylien: Supports 5 languages. Focused on processing news articles.
- Unbabel: Supports 31 languages. Focused on translating for customer service.
Optical Character Recognition (OCR)
Optical character recognition (OCR) is the process of converting images of text into machine-readable text. It is also often important to recognize the format of the text, particularly for structured documents like tables.