Let’s Work Together

Image Alt


Fundamentals of Natural Language Processing and AWS Comprehend

Natural language processing (NLP) called a branch of computer science that helps computers understand, decipher and manipulate human language. NLP draws from many disciplines, including computing and linguistics, in its purpose to fill the gap between human communication and computer comprehension.

Let’s talk more about NLP, it gained importance and popularity because of the exponential rise in the demand of human to machine communication and vice versa, with the advancement of devices supporting it. This is not supported easily but with many detailing done in the technical aspects such as improvisation of new algorithms coupled with the provisioning done of the existing ones, availability of new tools in the tech market such as Big data, DevOps, Internet of things and many others. As the tools are increasing rapidly so is the amount of data and logs that are generated by these tools. In order to provide a solution for the tracing and validity of the data new algorithms are needed as multiple types of the data is needed to be managed.

As a human being, one can speak and write in English, German, Spanish, Hindi or Chinese or maybe any other language in that case. But a computer’s accidence communication – referred to as code or machine language – is basically incomprehensible to the majority of people. At your device’s lowest levels or at the base level, communication occurs not with words but through a combination of zeros and ones that produce logical actions.  So the moment one speaks or writes our device make traversal of the data to the base level where NLP or other Artificial Intelligence(AI) and Machine Learning tools are ready with their work of understanding the structure and changing it accordingly for the understanding out the computer. In short, the data analytical part is majorly handled by NLP.

NLP Functionalities

But why NLP?

Everyday, humans speak thousands of words that other humans interpret to try to understand and other countless things. At its core, it’s simple communication, but we all know words run much deeper than that. There is a link or a conclusion that one derives from everything someone says. whether or not they imply something with their visual communication or in how often they mention something. While NLP doesn’t specialize in voice inflection, it does draw on contextual patterns. This is where it gains its appraisal. Many other modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Let me give you an example to show how powerful NLP is when implicated in a practical situation. When you are typing on your phone, as many folks do daily, you will see word suggestions supported what you type and what you’re currently typing. That is linguistic communication processing in action. It is such a typical thing that almost all people view as granted, and are taking it to be a very normal function of our keyboard to provide suggestions for years, but that’s why NLP becomes so important because the backend is totally based on it.

Expanding the NLP’s area into business aspects we can consider that some company is trying to choose options, on how best to advertise to their users. They will definitely be using Google to seek out common search terms that their users type when trying to find their product. With NLP at the back of the process, it provides a fast compilation of the info into terms obviously associated with their brand that the users could not expect. Capitalising is one of the uncommon terms that could give corporate the power to advertise in new ways.

NLP – Scope

The scope of Natural Language processing is not restricted to one defined domain rather every business demands it.

Taking some very basic applications of NLP in our everyday lives. Beyond conversing with virtual assistants like Alexa or Siri which is very much in demand, here I have got a few more examples: 

  1. Looking at your emails in your spam folder, did you notice similarities in the subject lines? That’s where Bayesian spam filtering, a statistical NLP technique that compares the words in spam to valid emails to identify junk mail, is used.
  2. Have you ever missed a phone call and read the automatic transcript of the voicemail in your email inbox or smartphone app? Well, that is a very interesting technicality of NLP known as a speech-to-text conversion,
  3. While navigating a website by using its built-in search bar, or by selecting suggested topic, entity or category tags, these observations are there because of the NLP methods for search, topic modeling, entity extraction, and content categorization.

As its clearly mentioned, that NLP has been a helping hand in bridging the gap between the computational linguistic and the normal human communication with the device and so many vendors came forward to adopt this as a base and have created services for its clients. There are many vendors such as IBM, Amazon Web Services, Microsoft, Github, Spacy, Google and many more.

Amazon Comprehend

Comprehend is a service provided by Amazon. Amazon Comprehend is based on natural language processing (NLP) which is used to apprehend the documents or the text provided. Amazon Comprehend does an operation on any text file in UTF-8 format. It is a powerful tool for catching the insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document.

Amazon Comprehend is one of the best options one should have in mind, for creating new products based on understanding the structure of documents. There are numerous functionalities AWS Comprehend is capable of doing for examples, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrase, the service identifies a particular dosage, strength, and frequency related to a specific medication from unstructured clinical notes. This had helped many companies for the better investigative examination of the unstructured data they receive. Before the discovery of such an algorithm, the only way to sort such things was totally based on manual detection, but this evolution had really made great differences.

Amazon Comprehend is based on machine learning that helps you to sort the relationships in your unstructured data by fetching the insights. The service is capable of identifying the language of the text, extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyses text using tokenization and parts of speech, and automatically organizes a collection of text files by topic. AWS comprehend provides the facility to use AutoML for building a self-customized set of structure for text, speech for the sorting methods that can be easily be modified specifically according to the organization’s requirements.

Features of AWS Comprehend

AWS Comprehend provides Entity Recognition, Key-phrase Extraction, Sentiment Analysis, Topic Modeling, and Language Detection APIs so natural language processing is easily integrated into applications. The concept lies here is totally based on the AWS API. It starts with the calling of the APIs in the application and gives the information about the location of the data to be worked upon. As a result, the APIs will output filtered data like key phrases, entities, sentiments, and language in a JSON format, which is easily used in the application.

Some features are listed below:

  • Key Phases– This API acknowledges the key phrases or talking points and a confidence score to support that this is a key phrase.
  • Sentiment Analysis– This API enables customers to analyze text using tokenization and Parts of Speech (PoS), and identify word boundaries and labels like nouns and adjectives within the text.
  • Syntax Analysis– This API returns the overall sentiment of a text (Positive, Negative, Neutral, or Mixed).
  • Entity Recognition-This API gives the named entities as a result (“People,” “Places,” “Locations,” etc.) that are automatically categorized based on the provided text.
  • Custom Entities– This s a customized functionality provided by AWS Comprehend to use for our specific domain. Here AutoML, Comprehend will learn from a small private index of examples (for example, a list of registration numbers of the employees in a specific company and text in which they are used), and they are being tested with a private customized model to recognize them. Being private there are no servers to manage, and no algorithms to master.
  • Language Detection– This API helps in automatically identifying text written in over 100 languages and returns the prevalent language with a confidence score to support that a language is dominant.
  • Multiple language support– AWS Comprehend is capable of performing text analysis on English, French, German, Italian, Portuguese, and Spanish texts. With a variety of language supported by AWZ Comprehend its convenient to build applications that can detect text in multiple languages, convert the text to English, French, German, Italian, Portuguese, and Spanish with Amazon Translate, and then use Amazon Comprehend to perform text analysis.

Not just these AWS Comprehend also has a different feature for providing assistance in the Medical Domain exclusively under the classification as Comprehend Medical. It has an API called Medical NERe that returns the medical information such as medication, medical condition, test, treatment, and procedures (TTP), anatomy, and Protected Health Information (PHI). It also identifies relationships between extracted sub-types associated with Medications and TTP. Another API called The Medical Ontology Linking is capable of recognizing medical information and links them to codes and concepts in standard medical ontologies. Medical conditions are linked to ICD-10-CM codes (e.g. “headache” is linked to the “R51” code) with the InferICD10CM API, while medications are linked to RxNorm codes (“Acetaminophen / Codeine” is linked to the “C2341132” Cui). The Medical Ontology Linking APIs also detect contextual information as entity traits (e.g. negation). 

AWS Comprehend has gained great success with its services it has provided and has made many trusted customers, for completely fulfilling their requirements. AWS is not the only application in the market but faces tough competition with others as well, yet it stands out to be the most popular because of security assurance, greater efficiency, and multiple simultaneous services. If you’re thinking of working on data analytics or any sort of filtering of logs AWS Comprehend is one the best requisition.


  • Tanishqa Rawlani

    April 15, 2020

    Explained NLP in depth, great content.

  • Saif Rizvi

    April 15, 2020

    Interesting content !!

  • Nikhil Saini

    April 15, 2020

    Thank you so much for sharing.I found it extremely helpful.


    April 15, 2020

    A well structured and insightful article.

  • Vasundhara Paneru

    April 15, 2020

    Amazing writeup!!


Add Comment