Logistic regression and its implementation for email spam filtering

Main Article Content

K. Srikanth
S. Ramakrishna, K.V.S.Sarma

Abstract

This paper deals with an experiment on spam filters using Logistic Regression in which the efficiency of the filter is influenced by characteristics of the frequency distribution of the tokens. The focus of discussion lies on the need for data cleaning before developing the model. Features that are inconsistent shall be separated out before including them in the model. The UCI dataset showing the percentage of token counts in each mail is used in the model and the discriminating ability of the filter is studied with the help of ROC curve.


Keywords: spam, Roc curve, Logistic, UCI data.

Downloads

Download data is not yet available.

Article Details

Section
Articles