Feature Selection

Feature selection is the process of selecting a feature subset on behalf of entire dataset for further processing. Recently, Rough Set based approaches for feature selection have gain prominence. Most of our work on feature selection is based on Rough Sets. Highlights of our work includes:

  • Rough-set theory (RST) eliminates unimportant or irrelevant features, thus generating a smaller (than the original) set of attributes with the same, or close to, classificatory power. Our initial work focused on the effects of rough sets on classification. Classification accuracy mapped to the type and number of attributes both in the original and the reduced datasets. This generated a framework for applying rough-sets for classification purposes. Rough-sets are then used for knowledge discovery in classification and the conclusion indicate a very significant result that removal of individual numeric attributes has far more effect on classification accuracy than removal of categorical attributes.
  • FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER produced an attribute reduction of 30% with a speed improvement of 2.6 times when used as pre-processor for two different rare itemset algorithms. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.
  • Rough Set based approaches which use attribute dependency to carry out feature selection have been prominent. However, this dependency measure requires calculation of positive region which is a computationally expensive task.  We have proposed a new concept called “Incremental Dependency Classes (IDCs)” which calculate attribute dependency without using positive region. IDCs define how attribute dependency changes as we move from one record to other. IDCs, by avoiding positive region, can be an ideal replacement for conventional dependency measure in feature selection algorithms especially for large datasets. We have used IDCs with various feature selection algorithms and experiments have shown that algorithms that use IDCs are more efficient and effective as compared to if they use conventional dependency measure. A comparison framework has also been defined to justify our solution. Experiments conducted using various publically available datasets from UCI repository have shown that calculating dependency itself using IDCs reduce the execution time by 70% while in case of feature selection algorithms using IDCs, execution time was reduced by almost 54%. Overall 68% decrease in required runtime memory was also found.

Related Publications:

  • Muhammad Summair Raza, Usman Qamar, An incremental dependency calculation technique for feature selection using rough sets, Information Sciences, Elsevier, ISI Indexed, IF: 3.89 (Accepted)

  • Usman Qamar, A dissimilarity measure based Fuzzy c-means (FCM) clustering algorithm, Journal of Intelligent and Fuzzy Systems, Volume 26, Number 1, Pages 229-238, IOS Press, ISI Indexed, IF: 0.922
  • Usman Qamar, John Keane, Clustering Using Rough-Set Feature Selection Journal of Basic and Applied Scientific Research (JBASR), Volume 2, Issues 5, pp 5578-5591, 2012, ISI Indexed
  • Syed Hasnain Ali, Madiha Guftar, Abdul Wahab Muzaffar, Usman Qamar, A Feature Reduction Framework based on Rough Set for Bio Medical Data Sets, IEEE SAI Intelligent Systems Conference 2015 (IntelliSys 2015), London, UK; 11/2015
  • Usman Qamar, Younus Javed: Frequent Itemset Mining using Rough-Sets. XII International Conference on Computer Science and Information Management (ICCSIM), London,UK; 10/2014
  • Usman Qamar, Younus Javed: FASTER: A Hybrid Algorithm for Feature Selection and Record Reduction in Rare Frequent Itemset. Proceedings of the World Congress on Engineering 2014, London,UK; 07/2014
  • Usman Qamar: A Rough-Set Feature Selection Model for Classification and Knowledge Discovery. Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on, Manchester; 10/2013
  • Usman Qamar, Technical Report, A FASTER way to record reduction and attribute selection for large Data, School of Computer Science, University of Manchester, 2010.
  • Usman Qamar, Technical Report, Rough Set Tutorial, School of Computer Science, University of Manchester, 2010.