EFFICIENT DATA MINING TECHNIQUES FOR BIG DATA ANALYSIS: A SURVEY

M. Amsaveni; S. Duraisamy; S. Duraisamy

doi:10.26483/ijarcs.v9i6.6348

PDF

Published: Dec 20, 2018

DOI: https://doi.org/10.26483/ijarcs.v9i6.6348

Keywords:

Big data, Data mining, dimensionality reduction, clustering, classification.

M. Amsaveni

S. Duraisamy

Abstract

Technology revolution has been facilitating millions of people by generating tremendous data, resulting in big data. It has been a confirmed phenomenon that enormous amount of data have been generated continuously at unprecedented and ever increasing scales. Even though, big data bears greater value, it brings tremendous challenges to extract hidden knowledge and more valuable insights from big data. The valuable information in big data can be obtained by applying data mining techniques in big data. The goal of big data mining techniques go beyond fetching the requested information or even uncovering some hidden relationships and patterns between data. Big data mining techniques involves various process like feature selection, clustering and classification. In this article, a detailed comparative survey on different processes of big data mining techniques such as dimensionality reduction, clustering and classification for big data analysis is presented. At first, different dimensionality reduction, clustering and classification methods proposed for big data analysis in previous researches are studied in detail. After that, a comparative and state-of-the-art analysis is carried out to identify the limitations in those methods.Â Â

Downloads

Download data is not yet available.

Issue

Vol. 9 No. 6 (2018): November-December 2018

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

References

Acharjya, D. P., & Ahmed, K. (2016). A survey on big data analytics: challenges, open research issues and tools. Int. J. Adv. Comput. Sci. Appl, 7(2), 1-11.

BolÃ³n-Canedo, V., SÃ¡nchez-MaroÃ±o, N., & Alonso-Betanzos, A. (2015). Recent advances and emerging challenges of

feature selection in the context of big data. Knowledge-Based Systems, 86, 33-45.

Zerhari, B., Lahcen, A. A., & Mouline, S. (2015, May). Big data clustering: Algorithms and challenges. In Proc. of Int. Conf. on Big Data, Cloud and Applications (BDCA'15).

Koturwar, P., Girase, S., & Mukhopadhyay, D. (2015). A survey of classification techniques in the area of big data. arXiv preprint arXiv:1503.07477.

Liu, X. Y., Liang, Y., Wang, S., Yang, Z. Y., & Ye, H. S. (2018). A Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection. IEEE Access, 6, 22863-22874.

Manoj, R. J., Praveena, M. A., & Vijayakumar, K. An ACOâ€“ANN based feature selection algorithm for big data. Cluster Computing, 1-8.

Fong, S., Wong, R., & Vasilakos, A. (2016). Accelerated PSO swarm search feature selection for data stream mining big data. IEEE transactions on services computing, (1), 1-1.

Badaoui, F., Amar, A., Hassou, L. A., Zoglat, A., & Okou, C. G. (2017). Dimensionality reduction and class prediction algorithm with application to microarray Big Data. Journal of Big Data, 4(1), 32.

Zhao, L., Chen, Z., Hu, Y., Min, G., & Jiang, Z. (2018). Distributed feature selection for efficient economic big data analysis. IEEE Transactions on Big Data, (2), 164-176.

Peralta, D., del RÃo, S., RamÃrez-Gallego, S., Triguero, I., Benitez, J. M., & Herrera, F. (2015). Evolutionary feature selection for big data classification: A mapreduce approach. Mathematical Problems in Engineering, 2015, 1-11.

Kuang, L., Yang, L. T., Chen, J., Hao, F., & Luo, C. (2018). A Holistic Approach for Distributed Dimensionality Reduction of Big Data. IEEE Transactions on Cloud Computing, (2), 506-518.

Ye, M., Liu, W., Wei, J., & Hu, X. (2016). Fuzzy-means and cluster ensemble with random projection for big data clustering. Mathematical Problems in Engineering, 2016.

Bu, F. (2018). An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT. Future Generation Computer Systems.

Bendechache, M., Kechadi, M. T., & Le-Khac, N. A. (2016, October). Efficient large scale clustering based on data partitioning. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), (pp. 612-621).

Shukla, A. K., & Muhuri, P. K. (2019). Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Engineering Applications of Artificial Intelligence, 77, 268-282.

Fahad, S. A., & Alam, M. M. (2016). A modified K-means algorithm for big data clustering. International Journal of Science, Engineering and Computer Technology, 6(4), 129.

Zhang, Q., Yang, L. T., Castiglione, A., Chen, Z., & Li, P. (2018). Secure weighted possibilistic c-means algorithm on cloud for clustering big data. Information Sciences.

Chen, J., Chen, H., Wan, X., & Zheng, G. (2016). MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era. Neural Computing and Applications, 27(1), 101-110.

Dagdia, Z. C. (2018). A scalable and distributd dendritic cell algorithm for big data classification. Swarm and Evolutionary Computation.

Xin, J., Wang, Z., Qu, L., & Wang, G. (2015). Elastic extreme learning machine for big data classification. Neurocomputing, 149, 464-471.

LÃ³pez, V., del RÃo, S., BenÃtez, J. M., & Herrera, F. (2015). Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets and Systems, 258, 5-38.

Elkano, M., Galar, M., Sanz, J., & Bustince, H. (2018). CHI-BD: A fuzzy rule-based classification system for Big Data classification problems. Fuzzy Sets and Systems, 348, 75-101.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)