What is relevant in a text document a machine learning based approach
Abstract
Text Documents often contain valuable data. But not all data is relevant. That is
why extracting relevant data from text documents is an essential task. Extracting
relevant data from text documents refers to the study of classifying text documents
into such groups that describe the contents of documents. There are many methods
to find out relevant data from a cluster of text or a text document. Classifying
extensive textual data helps to organize the records better, make the search easier
and relevant and simplify navigation. That makes this task still an open research
issue. This paper uses three techniques of classifying text documents: convolution
neural networks (CNN) with deep learning, Gaussian Na¨ıve Bayes and support vector machines (SVM). With these three algorithms, the text we want to classify goes
through three layers of checks. So, it gives us more reliability.