Analyzing mobile phone usage using clustering in Spark MLLib and Pig

Main Article Content

Shefali Arora


K-means is a common method of clustering data points using a predefined number of clusters. Apache Spark is a computing technology used for fast computation of data. By making use of its machine learning library called MLLib, we analyze mobile data obtained from by clustering according to latitude and longitude values ,using K-means algorithm. Once each data point is assigned its cluster number , the dataset is loaded into Apache Pig to calculate the number of users in each cluster. Thus, we can analyse the number of users using a mobile network in a particular range of latitude and longitude.

Keywords: Spark, Pig, clustering, mobile, data, analysis


Download data is not yet available.

Article Details