Important keywords extraction from documents using semantic analysis

Hasan, H. M. Mahedi; Sanyal, Falguni

View/Open

13101270, 13301058 _CSE.pdf (940.5Kb)

Date

2017-04

Abstract

Keyword extraction is an automatic selection of terms which describes the content of a document. Keywords define the terms that represent the core information from the documents. In order to go through massive amount of documents to find out the relevant information, keyword extraction will be the key approach. This approach will help us to understand the depth of a document even before we read it. In this research, we have found out different approaches and algorithms that have been used in keyword extraction technique. Conditional random fields (CRF), Support vector machine (SVM), NP-chunk, N-grams, Multiple linear regression, Logistic regression, and semantic analysis has been used to find out important keywords from a document. Immense research shows us that SVM and CRF gives better results where CRF accuracy is greater than SVM based on F1 score (The balance between precision and recall). According to precision, SVM shows better result than CRF. But, in case of recall, logit shows the greater result. Semantic relation between words is also another key feature in keyword extraction techniques. Semantic analysis is very effective field in natural language processing and using semantic relation, it is possible to find out the relation between words as well as between the lines. In this thesis paper, we have used semantic analysis and processing the documents to find out the important keywords from documents.

Keywords

Natural Language Processing (NLP); Semantic analysis; TextBlob; POS-tagging; N-grams; Keyword

Description

This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.

Cataloged from PDF version of thesis report.

Includes bibliographical references (page 50 - 51).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1480]