Sentiment analysis is one of the most important fields of Natural language processing (NLP). While Artificial intelligence (in general) and NLP (in particular) become more and more popular in the past ten years, sentiment analysis still be approached as many different terms, such as: Opinion mining, polarity detection, opinion extraction, sentiment mining, subjectivity analysis, emotion analysis, review mining, etc. To simply the representation, sentiment analysis is the field of study that extract people’s sentiments, attitudes, opinions of people towards entities such as products, services, organizations, individuals, issues, events, topics, etc.
Image 1: Opinions of users can be extracted from social media sources
We are in the era of Industry 4.0, where text data from social networks, e-commerce websites and many other social medias (such as: Blogs, forum discussions) (image 1) are overwhelming other kinds of data created by human. These days, social networks and e-commerce websites are the most important places for people to express their opinions, which are key influencers of our behaviors. Sentiment analysis is a process of automatically identifying whether a text expresses positive, negative or neutral opinion. Whenever a company want to expand the market of a product, they want to know customer’s opinion. A lawmaker want to promulgate new laws, they may need make a referendum. Sentiment analysis play a vital role for companies in marketing, public relations campaigns. By conducting activities such as: surveys, opinion polls, companies knows opinions of public to make strategic decisions.
In NLP, sentiment analysis is a very difficult problem. Firstly, understanding the sentiment of what someone is saying sometime tricky for even human. A sentence in this context maybe positive, but in other context with a different tone can be negative. Second, ambiguity is a big problem of NLP as well as Sentiment analysis. For example, the word sucks in “This camera sucks” is a negative opinion, but the one in “this vacuum cleaner really sucks” may implies a positive sentiment. Third, people can express many aspects and sentiments in one review: “This Iphone is slim but the camera is too big for me”. In this review, the sentence is positive about Iphone’s thickness, but negative about the camera. For the message: “My mother loves it but I don’t” , the fist clause is negative for the speaker, but the second one is positive for his mother.
Sentiment analysis in the era of big data, social network, and machine leaning and deep learning has had many achievements recently. Opinions, reviews, attitudes of Internet’s users or customers can be extracted from social networks or websites, then, this data need to be pre-processed, remove html tags, punctuations, emoticons (if needed), separated sentences from documents, etc. This sentiment-data (raw text) cannot be understood immediately by machine, other while, it need to be gone through a process name: Feature extraction (or feature engineering). (Feature: an individual measurable property or characteristic). This process transform raw text into numerical features that is usable for machine learning models. Whether the performance of a machine leaning model good or bad depends a lot on the feature extraction process. In sentiment analysis, the simplest features extraction can be uni or two-gram bag of words or tf-idf. Then a machine learning model, such as SVM, Naïve Bayes or Neural network can be used to classify the text as Positive, Negative or Neutral. In subsequent research, many more features, techniques and learning algorithms were tried by a large number of researchers in machine learning. The features can be: Word representation, part-of-speech (POS), sentiment dictionary, rules of opinions, etc.
Image 2: Sentiment analysis process from Twitter
Liu B., 2012, “Sentiment Analysis and Opinion Mining”
Erik C., Dipankar D., Sivaji B., Antonio F., “A Practical Guide to Sentiment Analysis”