Spam: A Big Data Challenge

Main Article Content

Liny Varghese
Supriya M.H
K Poulose Jacob


Spam consists of varieties of contents like text, image, embedded HTML, MIME attachments and also the volume of spam mails sent per day is massive. To handle this high volume, high velocity and large varieties of spam, a scalable spam filtering solution is required. Scalable solutions available for machine learning and statistical studies can be used to implement a scalable solution for spam filtering also. From Big data Analytics domain, Mahout is an open source library from Apache for building scalable solutions in machine learning. This paper uses mahout framework to analyse the time and accuracy efficiencies of the results of two Naïve Bayes classification algorithms.

Keywords: Apache Mahout, big data, scalable algorithms, Naïve Bayes algorithms


Download data is not yet available.

Article Details