Email classification and meeting scheduling using classifier algorithm
Abstract
This research investigates a comparison between two different approaches for classifying emails based on their categories. Naive Bayes and Hidden Markov Model (HMM), two different machine learning algorithms, both have been used for detecting whether an email is important or spam. Naive Bayes Classifier is based on conditional probabilities. It is fast and works great with small dataset. It considers independent words as a feature. HMM is a generative, probabilistic model that provides us with distribution over the sequences of observations. HMMs can handle inputs of variable length and help programs come to the most likely decision, based on both previous decisions and current data. Various combinations of NLP techniques- stopwords removing, stemming, lemmatizing have been tried on both the algorithms to inspect the differences in accuracy as well as to find the best method among them. Along with classifying emails, this paper also describes the methodologies used for automatic meeting scheduling by an intelligent email assistant. Users who regularly send or receive messages for setting up meetings will be greatly benefitted by this system as it will classify their emails and schedule their meetings automatically.