MAPREDUCE: INSIGHT ANALYSIS OF BIG DATA VIA PARALLEL DATA PROCESSING USING JAVA PROGRAMMING, HIVE AND APACHE PIG

Main Article Content

UJJWAL Agarwal

Abstract

Digital data which come from different sources like office, school, hospital, social media or machine generated data. Apache Hadoop is a software framework to store and process this enormous amount of data. Hadoop is using HDFS and MapReduce to store and process this huge volume of data. MapReduce is a programming model initiated by Google which can be written in different programming languages like Java, Python and Ruby. The main objective of this paper is to describe the concepts of MapReduce and showing the operation by using Java Program, Apache Pig and Hive. Hive and Apache Pig working on top layer of Hadoop ecosystem and provide the level of abstraction to run the MapReduce jobs. We write MapReduce program in Java to find anagram words from input files, group them together and save the result in output file. At the end we perform the operation in HiveQL (Hive query language) and Pig Latin Script and showing the backend process in MapReduce job.

Downloads

Download data is not yet available.

Article Details

Section
Articles
Author Biography

UJJWAL Agarwal

Lecturer ( Information Technology)