RARE CLASS PROBLEM IN DATA MINING: REVIEW

Main Article Content

Snehlata S. Dongre
Latesh G. Malik

Abstract

Class imbalance problem is getting so much attention of researchers now a days. In real life there are number of applications that generates imbalanced data sets. Imbalance nature of data makes classification task difficult. Dealing with these kinds of imbalanced dataset is the one of the biggest challenge in the data mining. Imbalanced dataset means the ratio of positive and negative classes is not balanced. The class that is having more number of samples is known as majority class and the class that is having less number of samples is known as minority class samples. Minority class samples are less but important. In the classification task, most of the times, we are ignoring minority class samples and more concentrating on majority class samples. This leads to good overall accuracy but poor minority class detection rate. Many algorithms have been proposed to deal with the imbalanced data problem but each has its prons and corns. Different techniques used for handling imbalance data are discussed here.

Downloads

Download data is not yet available.

Article Details

Section
Articles
Author Biographies

Snehlata S. Dongre, PhD Scholar, Computer Science and Engineering Department, GHRCE Nagpur, India

Computer Science and Engineering Department, PhD Scholar

Latesh G. Malik, Associate Professor, Computer Science and Engineering Department, Govt. Engineering College, Nagpur, India

Computer Science and Engineering Department, Associate Professor

References

M. Mazurowski, P. Habas, J. Zurada, J Lo, J. Baker and G. Tourassiet, “Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance,†Neural networks Vol. 21, No. 2, pp 427-436, 2008.

S. Dongre, K. Wankhade, “Intrusion Detection System Using New Ensemble Boosting Approach,†International Journal of Modeling and Optimization, Vol. 2, No. 4, pp 488-492. August 2012

Z. yang, W. Tang, A. shintemirov, and Q. wu, “Association rule miningbased dissolved gas analysis for fault diagnosis of power transformers,†IEEE Transaction of Sytem, Man, Cybernatics. C, Appl. Rev., vol. 39, no. 6, pp. 597-610, 2009.

W. Khreich, E. Granger, A. Miri, R. Sabourin, “Iterative Boolean combination of classifiers in the ROC space: An application to anomaly detection with HMMs,†Pattern Recognition Vol. 43, No. 8, pp 2732-2752, 2010.

R. Longadge, S. Dongre, L. Malik, “Class Imbalance Problem in Data Mining: Review,†International Journal of Computer Science and Network (IJCSN) Volume 2, No 1, February 2013.

T. Lakshi, Ch. Prasad, “A study on classifying Imbalanced Datasets,†Proc. International Conference on Networks & Soft Computing. 2014

R. Jin, A. Hauptmann, “On predicting rare classes with SVM Ensemble in scene classification,†Proc. IEEE International conference on Acoustics, speech and signal processing (ICASSP-2003). PP: 21-24. 2003

D. Williams, “Mine classification with Imbalanced Data,†IEEE Geoscience & remote sensing letters. Vol. 6, No 3. July 2009.

W. Jindaluang and V. Chauvatut, “Under-sampling by algorithm with performance Guaranteed for Class-imbalanced problem,†Proc. IEEE International Computer Science and Engineering Conference (ICSEC 2014). PP 215-221

B. Das, N. Krishnan and D. Cook, “RACOG and wRACOG: Two Probabilistic Oversampling Techniques,†IEEE Transactions on Knowledge and Data Engineering. Vol. 27, No. 1, January 2015. PP 222-234

X. Zhang and B. Hu, “A New Strategy of Cost-Free learning in the Class Imbalance Problem’†IEEE Transactions on Knowledge and Data Engineering. Vol. 25, No. 12, pp 2872-2885, December 2014

H. Wei, B. Sun and M. Jing, “BlancedBoost: A Hybrid Approach for Real-time Network Traffic Classification,†Proc. IEEE International Conference on Computer Communication and Networks (ICCCN), 2014

S. Wang, L. Minku and X. Yao, “A Learning Framework for online class imbalance Learning,†Proc. IEEE symposium Computational Intelligence and Ensemble Learning. pp 36-45, 2013.

S. Wang, L. Minku and X. Yao, “Resampling-Based Ensemble Methods for Online Class Imbalanced Learning,†IEEE Transactions on Knowledge and Data Engineering. Vol 27, No 5, May 2015. PP 1356-1368

C. Seiffert, T. Khoshgoftaar, J. Hulse and A. Napolitano. “RUSBoost: A Hybrid Approach to Alleviating Class imbalance,†IEEE Transactions on Systems, Man, and Cybernatics-Part A: Systems and Humans, Vol 40, No 1, Jan 2010.

R. Rashu, N. Haq and R. Rahman, “Data Mining Approches to predict Final Grade by Overcoming Class imbalance Problem,†Proc. International Conference on Computer and Information Technology (ICCIT) IEEE 2014

Q. Wang, “A Hybrid Sampling SVM Approach to Imbalanced Data Classification,†Research Article Hindawi Publishing Corporation, Abstract and Applied Analysis, Vol 2014.

N. Liu, W. Woon and Z. Afshari, “Handling Class imbalance in Customer Behaviour Prediction,†Proc. International Conference on Collaboration Technologies and Systems IEEE 2014. pp 100-103

H. He and A. Ghodsi, “Rare class Classification by Support Vector Machine,†Proc. International Conference on Pattern Recognition IEEE, pp 548-551, 2010

N. Pedrajas, J. Rodríguez, and A. García, “OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets,†IEEE Transactions On Cybernetics, Vol. 43, No. 1, February 2013

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,†Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

S. Hu, Y. Liang, L. Ma, and Y. He, “MSMOTE: Improving classification performance when training data is imbalanced,†Proc. International Workshop Computer Science and Engineering, vol. 2, pp. 13–17, 2009

Y. Park and J. Ghosh, “Ensembles of α-Trees for Imbalanced Classification Problems†IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 1, pp 131-143, 2014

P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “A Low-Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution,†IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 9, pp 1202-1213, 2007

N. Thach, P. Rojanavasu and O. Pinngern, “Cost-sensitive XCS Classifier System Addressing Imbalance Problems,†Proc. International Conference on Fuzzy Systems and Knowledge Discovery IEEE. pp 132-136, 2008