Publication: Incremental adaptive spam mail filtering using naïve Bayesian classification
Issued Date
2009-12-10
Resource Type
Other identifier(s)
2-s2.0-71249134800
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
10th ACIS Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2009, In conjunction with IWEA 2009 and WEACR 2009. (2009), 243-248
Suggested Citation
Phimphaka Taninpong, Sudsanguan Ngamsuriyaroj Incremental adaptive spam mail filtering using naïve Bayesian classification. 10th ACIS Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2009, In conjunction with IWEA 2009 and WEACR 2009. (2009), 243-248. doi:10.1109/SNPD.2009.45 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/27448
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Incremental adaptive spam mail filtering using naïve Bayesian classification
Author(s)
Other Contributor(s)
Abstract
Most content based spam filters are rule based or trained off-line. Handling new spam tactics is difficult and prone to high misclassification rate. This paper proposes an incremental adaptive spam mail filtering using Naïve Bayesian classification which gives good performance, simplicity and adaptability. We model an incremental scheme that receives a stream of emails, and applies the concept of sliding window to train only the last w emails for testing new incoming messages. Subsequently, the new features of tested messages are added to the existing features so that the model will be adaptive to future incoming emails. The proposed model is tested on two corpora: Trec05p-1 [11] and Trec06p [12]. The parameters are the window size and the number of features, and the evaluation metrics are the processing time per message, and the ham and spam misclassification rates. The experimental results show that the number of features has little impact whereas the window size has significant effects on misclassification rates and the processing time. In addition, the overall accuracy is even better than that obtained from the batch off-line training and the processing time is reduced significantly. © 2009 IEEE.