Logistic regression and its implementation for email spam filtering

K. Srikanth, S. Ramakrishna, K.V.S.Sarma

Abstract


This paper deals with an experiment on spam filters using Logistic Regression in which the efficiency of the filter is influenced by characteristics of the frequency distribution of the tokens. The focus of discussion lies on the need for data cleaning before developing the model. Features that are inconsistent shall be separated out before including them in the model. The UCI dataset showing the percentage of token counts in each mail is used in the model and the discriminating ability of the filter is studied with the help of ROC curve.


Keywords: spam, Roc curve, Logistic, UCI data.


Full Text:

PDF


DOI: https://doi.org/10.26483/ijarcs.v3i5.1391

Refbacks

  • There are currently no refbacks.




Copyright (c) 2016 International Journal of Advanced Research in Computer Science