EFFICIENT DATA MINING TECHNIQUES FOR BIG DATA ANALYSIS: A SURVEY

Main Article Content

M. Amsaveni
S. Duraisamy
S. Duraisamy

Abstract

Technology revolution has been facilitating millions of people by generating tremendous data, resulting in big data. It has been a confirmed phenomenon that enormous amount of data have been generated continuously at unprecedented and ever increasing scales. Even though, big data bears greater value, it brings tremendous challenges to extract hidden knowledge and more valuable insights from big data. The valuable information in big data can be obtained by applying data mining techniques in big data. The goal of big data mining techniques go beyond fetching the requested information or even uncovering some hidden relationships and patterns between data. Big data mining techniques involves various process like feature selection, clustering and classification. In this article, a detailed comparative survey on different processes of big data mining techniques such as dimensionality reduction, clustering and classification for big data analysis is presented. At first, different dimensionality reduction, clustering and classification methods proposed for big data analysis in previous researches are studied in detail. After that, a comparative and state-of-the-art analysis is carried out to identify the limitations in those methods.  

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

Acharjya, D. P., & Ahmed, K. (2016). A survey on big data analytics: challenges, open research issues and tools. Int. J. Adv. Comput. Sci. Appl, 7(2), 1-11.

Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2015). Recent advances and emerging challenges of

feature selection in the context of big data. Knowledge-Based Systems, 86, 33-45.

Zerhari, B., Lahcen, A. A., & Mouline, S. (2015, May). Big data clustering: Algorithms and challenges. In Proc. of Int. Conf. on Big Data, Cloud and Applications (BDCA'15).

Koturwar, P., Girase, S., & Mukhopadhyay, D. (2015). A survey of classification techniques in the area of big data. arXiv preprint arXiv:1503.07477.

Liu, X. Y., Liang, Y., Wang, S., Yang, Z. Y., & Ye, H. S. (2018). A Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection. IEEE Access, 6, 22863-22874.

Manoj, R. J., Praveena, M. A., & Vijayakumar, K. An ACO–ANN based feature selection algorithm for big data. Cluster Computing, 1-8.

Fong, S., Wong, R., & Vasilakos, A. (2016). Accelerated PSO swarm search feature selection for data stream mining big data. IEEE transactions on services computing, (1), 1-1.

Badaoui, F., Amar, A., Hassou, L. A., Zoglat, A., & Okou, C. G. (2017). Dimensionality reduction and class prediction algorithm with application to microarray Big Data. Journal of Big Data, 4(1), 32.

Zhao, L., Chen, Z., Hu, Y., Min, G., & Jiang, Z. (2018). Distributed feature selection for efficient economic big data analysis. IEEE Transactions on Big Data, (2), 164-176.

Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J. M., & Herrera, F. (2015). Evolutionary feature selection for big data classification: A mapreduce approach. Mathematical Problems in Engineering, 2015, 1-11.

Kuang, L., Yang, L. T., Chen, J., Hao, F., & Luo, C. (2018). A Holistic Approach for Distributed Dimensionality Reduction of Big Data. IEEE Transactions on Cloud Computing, (2), 506-518.

Ye, M., Liu, W., Wei, J., & Hu, X. (2016). Fuzzy-means and cluster ensemble with random projection for big data clustering. Mathematical Problems in Engineering, 2016.

Bu, F. (2018). An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT. Future Generation Computer Systems.

Bendechache, M., Kechadi, M. T., & Le-Khac, N. A. (2016, October). Efficient large scale clustering based on data partitioning. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), (pp. 612-621).

Shukla, A. K., & Muhuri, P. K. (2019). Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Engineering Applications of Artificial Intelligence, 77, 268-282.

Fahad, S. A., & Alam, M. M. (2016). A modified K-means algorithm for big data clustering. International Journal of Science, Engineering and Computer Technology, 6(4), 129.

Zhang, Q., Yang, L. T., Castiglione, A., Chen, Z., & Li, P. (2018). Secure weighted possibilistic c-means algorithm on cloud for clustering big data. Information Sciences.

Chen, J., Chen, H., Wan, X., & Zheng, G. (2016). MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era. Neural Computing and Applications, 27(1), 101-110.

Dagdia, Z. C. (2018). A scalable and distributd dendritic cell algorithm for big data classification. Swarm and Evolutionary Computation.

Xin, J., Wang, Z., Qu, L., & Wang, G. (2015). Elastic extreme learning machine for big data classification. Neurocomputing, 149, 464-471.

López, V., del Río, S., Benítez, J. M., & Herrera, F. (2015). Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets and Systems, 258, 5-38.

Elkano, M., Galar, M., Sanz, J., & Bustince, H. (2018). CHI-BD: A fuzzy rule-based classification system for Big Data classification problems. Fuzzy Sets and Systems, 348, 75-101.