Important keywords extraction from documents using semantic analysis
Abstract
Keyword extraction is an automatic selection of terms which describes the content of a document. Keywords define the terms that represent the core information from the documents. In order to go through massive amount of documents to find out the relevant information, keyword extraction will be the key approach. This approach will help us to understand the depth of a document even before we read it. In this research, we have found out different approaches and algorithms that have been used in keyword extraction technique. Conditional random fields (CRF), Support vector machine (SVM), NP-chunk, N-grams, Multiple linear regression, Logistic regression, and semantic analysis has been used to find out important keywords from a document. Immense research shows us that SVM and CRF gives better results where CRF accuracy is greater than SVM based on F1 score (The balance between precision and recall). According to precision, SVM shows better result than CRF. But, in case of recall, logit shows the greater result. Semantic relation between words is also another key feature in keyword extraction techniques. Semantic analysis is very effective field in natural language processing and using semantic relation, it is possible to find out the relation between words as well as between the lines. In this thesis paper, we have used semantic analysis and processing the documents to find out the important keywords from documents.