Analyzing mobile phone usage using clustering in Spark MLLib and Pig

Shefali Arora

Abstract


K-means is a common method of clustering data points using a predefined number of clusters. Apache Spark is a computing technology used for fast computation of data. By making use of its machine learning library called MLLib, we analyze mobile data obtained from Opencellid.org by clustering according to latitude and longitude values ,using K-means algorithm. Once each data point is assigned its cluster number , the dataset is loaded into Apache Pig to calculate the number of users in each cluster. Thus, we can analyse the number of users using a mobile network in a particular range of latitude and longitude.

Keywords: Spark, Pig, clustering, mobile, data, analysis

Full Text:

PDF


DOI: https://doi.org/10.26483/ijarcs.v8i1.2869

Refbacks

  • There are currently no refbacks.




Copyright (c) 2017 International Journal of Advanced Research in Computer Science