Chatbot, Intent and Entity

Chatbot, Intents & Entities

I. What is chatbot

Chatbot is an application try to simulate the conversation or “chatter” of a human being via text or voice interactions. A user can ask a chatbot a question or make a command, and the chatbot responds or performs the requested action.

The applications of chatbot: virtual assistant (Siri, Google Assistant, Cortana), smart speaker (Alexa), applications in smart home (Jarvis),…

But how a chatbot works ?

II. How a chatbot works

A common way to approach chatbot is using intent detection and entity detection.

Intent detection helps the system to understand what field that sentence mentions.

Entity detection helps the system to obtain the corresponded attributes of intent.


“I will come at 8pm”  => the entity is 8pm belongs to time’s field.

To perform these functions (intents detection and entities detection), we use 2 machine learning models: Intents classifier and Entities classifier.

III. Intent classification

In science, the definition of classification is simple to understand: a dog is in the class “animal”, the sun is in the class “planets”.

In this situation, we need a model to classify topic of document (intent classifier).

Think of this model as a way of categorizing a piece of data (a sentence) into one of several categories (an intent). The input “how are you?” is classified as an intent or class (health). Based on this intent, the output is associated with a response such as “I am fine” or “I am good”.

First step, we clean the sentence to remove the punctuation, teencode and whitespace.

Second step, because machine cannot understand the raw text so we need a step to convert text to number. Then we use model Bag of words (BOW).

The idea of BOW is to count the frequency of each word that appear in a part of document, based on the dictionary, which is the collection of all words that appear in the full of documents.

Example 1:

(1) John likes to watch movies. Mary likes movies too.

(2) John also likes to watch football games.

=> we have a dictionary contains words of 2 sentences above:

[“John”, “likes”, “to”, “watch”, “movies”, “Mary”, “too”, “also”, “football”, “games” ]

Using BOW, we convert 2 sentences to 2 vectors

(1) [1, 2, 1, 1, 2, 1, 1, 0, 0, 0]

(2) [1, 1, 1, 1, 0, 0, 0, 1, 1, 1]

This transformation not only converts text to number, but also retains all the features of 2 sentences above.

But BOW have a problem: if the common words (the, is, a, ….) appears usually in corpus, the model will predict the bad result and it causes the low accuracy. Therefore, we have a solution: using TF-IDF.

The idea of TF-IDF separate 2 parts: TF and IDF.

  1. TF(Term Frequency) is the frequency of a word that appear in a document, is computed:

f(t,d): number of word t in document d

Denominator: a sum of word in document d

  1. IDF(Inverse Document Frequency): the inverse frequency of a word in the corpus (all of documents). The goal of IDF is to reduce the values of the common words (the, is, a, …) since these words don’t have much meaning in classifying text.

N: total number of documents

Denominator: number of  document d contains word t

If the word t doesn’t appear in any document then the denominator equal 0. So in this situation, we should add 1 to denominator.

We combine 2 formulas TF and IDF:

This formulas cleans the commons words and keeps the valuables words (high result of tfidf), since these words appear much in this document and appear less in other document.

In conclusion, we should combine 2 model: BOW and TFIDF to solve the problem of BOW and use it to build the intents classifier.

Last step, we use these vectors as input of machine learning model to predict intent corresponded.

IV. Entity classification

In NLP, to detect an entity (person, location, data, number, ….) in a document, we use a model that’s called Named Entity Recognition (NER), also known as Entity Extraction.

NER is a subtask of information extraction that seeks to locate and classify named entities in text (document, sentences, ….) into pre-defined categories such as the names of persons, organizations, locations, ….


“Jim bought 300 shares of Acme Corp. in 2006”

After using NER, we have:

“[Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time


How do system recognize “Jim” is a person, “Acme Corp” is an organization and “2006” is time ?

The answer is using a model (Part of speech Tagger) to train and add label to each token (word) of document.



“Jim/@Person bought/O_W 300/O_W shares/O_W of/O_W Acme Corp./@Organization in/O_W 2006/@Time

Here, Part of speech Tagger (POS tagging) will add label @Person to Jim, @Organization to Acme Corp., @Time to 2006.

System can recognize these entities after training if we add it into a training data set. In some situation, if the first character of a token is in uppercase and is after the token “from” or “to” then it will be predicted as @Location. If the format of a token is DD/MM/YYYY then this token is classified to @Date. If the format of a token is HH:MM (AM,PM) then this token is classified to @Time. All the tokens haven’t the characteristics will are classified to O_W.

At present, the model have the good accuracy is a model that combines: Deep learning and CRF. It is known as Bi-LSTM-CRF.

V. Conclusion

Chatbot is a hot trend and it can replace many applications in the future. Therefore, the technique of chatbot are increasingly improved each day. In the future, chatbot will replace many applications to perform the requests of users.