SentiMI: Introducing Point-wise Mutual Information with SentiWordNet to Improve Sentiment Polarity Detection

Supervised learning has attracted much attention in recent years. As a consequence, many of the state-of-the-art algorithms are domain dependent as they require a labeled training corpus to learn the domain features. This requires the availability of labeled corpora which is a cumbersome task in itself. However, for text sentiment detection SentiWordNet (SWN) may be used. It is a vocabulary where terms are arranged in synonym groups called synsets. This research makes use of SentiWordNet and treats it as the labeled corpus for training. A sentiment dictionary, SentiMI, builds upon the mutual information calculated from these terms. A complete framework is developed by using feature selection and extracting mutual information, from SentiMI, for the selected features. Training, testing and evaluation of the proposed framework are conducted on a large dataset of 50,000 movie reviews. A notable performance improvement of 7% in accuracy, 14% in specificity and 8% in f-measure is achieved by the proposed framework as compared to the baseline SentiWordNet classifier. Comparison with the state-of-the art classifiers is also performed on widely used Cornell Movie Review dataset which also proves the effectiveness of the proposed approach.