FiDoop: An Interactive GUI to Identify Frequent Items Using Map Reduce

Main Article Content

Raksha D
P Hari Prasad Reddy
Mukesh P U
Prof. Raghavendra Reddy

Abstract

Due to an exponential increase of real-time data monitoring systems, the extraction of frequent itemset from the large database is a challenging task. Memory usage and excessive runtime for less amount of data, automatic parallelization are the limitations in existing algorithms of frequent itemsets. FiDoop based itemset algorithm is introduced by using MapReduce framework to overcome this problem. This system includes activities such as data uploading, preprocessing, threshold, find support and confidence, merge and result. We implement FiDoop on our in-house Hadoop cluster. To improve FiDoop’s performance a workload balance matric is used to measure load balancing across the cluster's computing node is developed. Initially, data is selected from the dataset and uploaded to the server, then the preprocessing stage removes columns which contain unwanted entries. The information is analyzed and partitioned to compute threshold value. Finally, frequent itemsets are merged to acquire frequent pattern. This proposed system is mainly developed for improving accuracy and is evaluated based on the performance measures.

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

Y.-J. Tsay, T.-J. Hsu, and J.-R. Yu, “FIUT: A new method for mining frequent itemsets,†Inf. Sci., vol. 179, no. 11, pp. 1724 – 1737, 2009.

D. Chen et al., “Tree partition based parallel frequent pattern mining on shared memory systems,†in. 20th IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), Rhodes Island, Greece, 2006, pp. 1– 8. [3] K.-M. Yu, J. Zhou, T.-P. Hong, and J.-L. Zhou, “A load-balanced Distributed parallel mining algorithm,†Expert Syst. Appl., vol. 37, no. 3, pp. 2459 – 2464, 2010. [4] E.-H. Han, G. Karypis, and V. Kumar, “Scalable parallel data mining for association rules,†IEEE Trans. Knowl. Data Eng., vol. 12, no. 3, pp. 337 – 352, May/Jun. 2000. [5] L. Zhou et al., “Balanced parallel FP-growth with MapReduce,†in Proc. IEEE Youth Conf. Inf. Comput. Telecommun. (YC-ICT), Beijing, China, 2010, pp. 243 – 246. [6] K. W. Lin, P.-L. Chen, and W.-L. Chang, “A novel frequent pattern mining algorithm for very large databases in cloud computing. [7] S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan,“The study of improved FP-growth algorithm in MapReduce,†inProc. 1st Int. Workshop Cloud Comput.Inf. Security, Shanghai, China, 2013, pp. 250 – 253 [8] M.-Y. Lin, P.-Y. Lee, and S.-C. Hsueh, “Aprioribased frequent itemset mining algorithms on MapReduce,†in Proc. 6th Int. Conf. Ubiquit. Inf. Manage. Commun.(ICUIMC), Danang, Vietnam, 2012,pp.76:1– 76:8. [9] L. Liu, E. Li, Y. Zhang, and Z. Tang, “Optimization of frequent itemset mining on multiple-core processor,†in Proc. 33rd Int. Conf. Very Large Data Bases, Vienna, Austria, 2007, pp. 1275 – 1285. [10] A. Javed and A. Khokhar, “Frequent pattern mining onmessage passing multiprocessor systems,†Distrib.Parallel Databases, vol. 16, no. 3, pp. 321 – 334, 2004. [11] J. Dean and S. Ghemawat, “MapReduce: A flexible data processing tool,†Commun. ACM, vol. 53, no. 1, pp. 72 – 77, Jan. 2010.