Building Normalized SentiMI to enhance semi-supervised sentiment analysis

Sentiment analysis and polarity detection is a type of text classification where natural language opinion is analyzed in order to classify it into either positive or negative categories. Classification of text into sentiment labels is a very difficult task as opinions expressed in natural language may contain abbreviations, slangs, sarcasm, irony and/or idioms. The proposed research focuses on the use of SentiWordNet3.0 as a labeled corpus for training purposes. We present a complete framework based on a dictionary named Normalized SentiMI (nSentiMI) which is created by calculating point-wise mutual information for each term/part-of-speech pair extracted from SentiWordNet. The proposed framework is applied on a dataset of 50,000 movie reviews to identify the value of a weight factor α and then evaluated on an unseen test dataset of 2000 movie reviews. Comparison with state of art techniques also confirms the superiority of proposed approach.