A Multi-Criteria Weighted Vote based Classifier Ensemble for Heart Disease Prediction

The availability of large amount of medical data leads to the need of intelligent disease prediction and analysis tools to extract hidden information. Large number of data mining and statistical analysis tools are used for disease prediction. Single data mining techniques show acceptable level of accuracy for heart disease diagnosis. This research paper focuses on prediction and analysis of heart disease using weighted vote based classifier ensemble technique. The proposed ensemble model overcomes the limitations of conventional data mining techniques by employing the ensemble of five heterogeneous classifiers: Naïve Bayes, Decision Tree based on Gini Index, Decision Tree based on Information Gain, Instance based Learner and Support Vector Machines. We have used five benchmark heart disease datasets taken from UCI repository. Each dataset contains different set of feature space that ultimately leads to the prediction of heart disease. The effectiveness of proposed ensemble classifier is investigated by comparing the performance with several different researchers’ techniques. 10 fold cross validation is used to handle the class imbalance problem. Moreover, confusion matrices and ANOVA statistics are used to show the prediction results of all classifiers. The experimental results verify that the proposed ensemble classifier can deal with all type of attributes and it has achieved the high diagnosis accuracy of 87.37%, sensitivity 93.75%, specificity 92.86%, and f-measure 82.17%. The f-ratio higher than the f-critical and p-value less than 0.01 for 95% confidence interval indicates that the results are statistically significant for all the datasets.